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To F. J. H. 


PROLOCUTION 


It will be no strange thing at all for some to dislike the 
matter of this work, and others to be displeased with 
the manner and method of it. Easily can I forsee that 
my account will be too long and tedious for some, while 
others, perhaps, may be apt to complain of its being too 


short and concise. 
Edmund Calamy 


Preface to the 2nd edition 


La derntére chose qu’on trouve en 
fatsant un ouvrage, est de savoir celle 
quil faut mettre la premiere. 


Blaise Pascal, Pensées. 


In the Preface to the first edition of his Grammar of Science Karl Pearson, 
with a cavalier approach to one of the niceties of conventional grammar, 
wrote 


There are periods in the growth of science when it is well to turn 
our attention from its imposing superstructure and to carefully 
examine its foundations. 


Since statistics is fundamental to all science, and since probability in turn 
is as necessary in the understanding and development of statistical tech- 
niques and theory as it is in life in general, it is necessary, I believe, for 
statisticians to heed Pearson’s dictum and to consider, at least from time 
to time, the foundations of their discipline. It is with this in mind that this 
work is offered, my particular concern being the examination of the devel- 
opment of one of the fundamental aspects of modern Bayesian Statistics. 

It is perhaps usual to find, in the second edition of almost any book, new 
results and other material that has come to light since the publication of 
the first edition. Things are slightly different with respect to the present 
work, however: here the reader will find discussion of the work of a number 
of authors that was omitted, for no good reason, from the first edition, and 
the inclusion of which here sheds more light on the use made by nineteenth 
century authors of inverse probability. 

More specifically, this second edition contains, in addition to the correc- 
tion of adventitious errors in the first edition, adscititious material (in vary- 
ing amounts) in §§4.5 (Bayes’s postulate and scholium), 5.4 (Michell), 6.3 
(Condorcet’s memoir), 7.15 (Laplace’s Théorie analytique des probabilités), 
8.2 (Lubbock and Drinkwater-Bethune), 8.4 (de Morgan), 8.5 (Bienaymé), 
8.6 (Ostrogradskii), 8.8 (Catalan), 8.10 (Cournot), 8.11 (Mill), 8.14 (Ellis), 
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8.16 (Donkin), 9.8 (Edgeworth), 9.10 (Crofton), 9.13 (Bertrand) and 9.15 
(Makeham). It also contains completely new sections (with appropriate 
notes) on Johann Heinrich Lambert, Pierre Simon Laplace’s Recherches 
sur le milieu, Lambert Adolphe Jacques Quetelet, Viktor Yakovlevitch 
Buniakovskii, Charles Saunders Peirce, Charles Lutwidge Dodgson, Henri 
Poincaré and Hugh MacColl. 


November, 1998 


Preface to the 1st edition 


It 1s thought as necessary to write a 
Preface before a Book, as it is judged 
civil, when you invite a Friend to 
Dinner, to proffer him a Glass of Hock 
beforehand for a Whet. 


John Arbuthnot, from the preface 
to his translation of Huygens’s 
“De Ratiocinis in Ludo Alee”. 


Prompted by an awareness of the importance of Bayesian ideas in modern 
statistical theory and practice, I decided some years ago to undertake a 
study of the development and growth of such ideas. At the time it seemed 
appropriate to begin such an investigation with an examination of Bayes’s 
Essay towards solving a problem in the doctrine of chances and Laplace’s 
Théorie analytique des probabilités, and then to pass swiftly on to a brief 
consideration of other nineteenth century works before turning to what 
would be the main topic of the treatise, videlicet the rise of Bayesian statis- 
tics from the 1950’s to the present day. 

It soon became apparent, however, that the amount of Bayesian work 
published was such that a thorough investigation of the topic up to the 
1980’s would require several volumes — and also run the risk of incurring 
the wrath of extant authors whose writings would no doubt be misrepre- 
sented, or at least be so described. It seemed wise, therefore, to restrict 
the period and the subject under study in some way, and I decided to con- 
centrate my attention on inverse probability from Thomas Bayes to Karl 
Pearson. 

Pearson was born in 1857 and died in 1936, and in a sense a watershed in 
statistics was reached during his lifetime. The somewhat cavalier approach 
to inverse probability that one finds in many writings in the century follow- 
ing the publication of Bayes’s Essay was succeeded in the fullness of time 
(even if destined only by Tyche) by the logical and personal approach to 
probability grounded on the works of Jeffreys, Johnson, Keynes, Ramsey 
and Wrinch in the first third of this century (and Jeffreys in fact gained 
his inspiration from Pearson’s Grammar of Science). At roughly the same 
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time Fisher was making himself a statistical force — indeed, one can per- 
haps view the rigorous development of Bayes’s work into a statistical tool 
to be reckoned with as a reaction to Fisher’s evolution of sampling theory. 
The thirties also saw the birth of the Neyman-Pearson (and later Wald) 
decision-theoretic school, and subsequent work of this school was later in- 
corporated into the Bayesian set-up, to the distinct advantage of both. 

One must also note the rise of the biometric school, in which Pearson 
of course played a considerable rdle, and which owed its growth to the ap- 
pearance of Francis Galton’s Natural Inheritance of 1889 and his work on 
correlation. This work also awoke Walter Frank Raphael Weldon’s interest 
in correlation, and he in turn did much to turn Pearson’s thoughts to evo- 
lution. William Sealy Gosset’s work c.1908 foreshadowed an attenuation in 
inverse probability, a tendency that was to be reversed only in the mid- 
twentieth century. 

It would not be too great a violation of the truth to say that, after 
roughly the beginning of this century, inverse probability took a back seat 
to the biometric, Fisherian and logical schools, from which it would only 
rise around 1950 with the work of Good and Savage and the recognition of 
the relevance of de Finetti’s earlier studies. Pearson, whose writings cover 
both inverse probability and what would today be grouped under “classi- 
cal” methods, seems then to be a suitable person with whom to end this 
study. 

Todhunter’s classic History of the Mathematical Theory of Probability 
was published in 1865. For reasons as to which it would be futile to specu- 
late here, nothing in similar vein, and of such depth, appeared for almost a 
century (I except books nominally on other topics but containing passages 
or chapters on the history of statistics or probability, anthologies of papers 
on this topic, and works on the history of social or political statistics and 
assurances) until David’s little gem of 1962. Several works in similar vein 
followed, the sequence culminating in Stigler’s History of Statistics of 1986 
and Hald’s History of Probability and Statistics, the latter appearing in 
1990 as the writing of this book nears completion (for trying to write a 
preface before the actual text is complete is surely as awkward as trying to 
“squeeze a right-hand foot into a left-hand shoe” ). 

Before I am carelessly castigated or maliciously maligned let me say what 
will not be found here. Firstly, there will be little biographical detail, apart 
from that in the first chapter on Thomas Bayes. Secondly, little will be 
found in the way of attempt at putting the various matters discussed in 
the “correct” historical and sociological context. To interpret early results 
from a modern perspective is at best misguided, and I lack the historian’s 
ability, or artifice, to place myself in the period in which these results were 
first presented. Those interested in these aspects will find abundant satis- 
faction in the Dictionary of National Biography, the Dictionary of Scientific 
Biography, and the books by Hald and Stigler cited above. Daston’s Clas- 
sical Probability in the Enlightenment of 1988 may also be useful: like the 
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work by Hald it appeared too late to be consulted in the writing of this 
text. 

Our aim is more modest — and the captious critic will no doubt opine 
with Winston Churchill that there is much to be modest about! It is to 
present a record of work on inverse probability (that is, crudely speaking, 
the arguing from observed events to the probability of causes) over some 
150 years from its generally recognized inception to the rise of its sample- 
theoretic and logical competitors. Since this is a record, it has been thought 
advisable to preserve the original notations and the languages used — at 
least almost everywhere. For while translations may well help the thought- 
ful reader, the serious scholar will need the original text to avoid being 
misled by the translator’s inability to render precise any nuances taxing 
his linguistic capabilities. 

Those who have read Augustus de Morgan’s A Budget of Paradozes, or 
any of his historical works, will recall his penchant for dwelling on the ob- 
scure and almost forgotten works of minor writers, an inclination that he 
once justified by writing 


names which are now unknown to general fame are essential to 
a sufficient view of history. [1855, p. 21] 


Since we too labour under this affliction, the reader will find here, in addi- 
tion to discussions of the pertinent writings of several luminaries, consid- 
eration of the works of those who are less well known, and whose light, if 
it ever shone at all, shone with only a few candle-power. The reasons for 
such consideration are threefold: first, that these lesser works, if pertinent, 
should not be relegated in perpetuity to obscurity; secondly, that the effect 
of the more overpowering light of their more famous confreres on the wider 
contemporary scientific community should be seen; and thirdly, that the 
reader might judge for himself whether the apparent obscurity to which 
they have been assigned is indeed warranted. It is to be hoped, though, 
that this consideration has not led to a book of which it can be said, as 
M.G. Kendall [1963] said of Todhunter’s magnum opus, that “it is just 
about as dull as any book on probability could be.” 

It is not claimed that this is the history of inverse probability: rather, it 
is one man’s view of the topic, a view, it is hoped, in which any peculiar- 
ities observed will be ascribed to innocent illusion rather than deliberate 
delusion, and in which the seeds of future research may be nurtured. 


Is there not something essentially dtaboli- 
cal in keeping the impatient reader, even for 
one moment, from the joys that await him? 


D. N. Brereton, introduction to 
Charles Dickens’s “Christmas Books”, 
British Books edttion. 
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On inverse probability 


It is indeed a thing so versatile and mul- 
tiform, appearing in so many shapes, so 
many postures, so many garbs, so var- 
tously apprehended by several eyes and 
judgments, that it seemeth no less hard to 
settle a clear and certain notice thereof, 
than to make a portrait of Proteus, or to 
define the figure of the fleeting air. 


Isaac Barrow. Sermon XIV. 
Against foolish talking and jesting. 


1.1 Introduction 


The task of an essayist is by no means an easy one. His work must be 
entertaining but not frivolous, topical yet possessive of a certain endur- 
ing quality, pungent but not acrid, enlightening but not prescriptive. The 
essayist must possess a wide general knowledge of contemporary as well 
as classical culture (these terms being interpreted in the broadest possi- 
ble sense), for often a pithy mot juste from an earlier writer will cast an 
unusual and unexpected light on an otherwise mundane observation. Such 
allusion, of course, should not be! 


merely corroborative detail, intended to give artistic verisimilitude 
to an otherwise bald and unconvincing narrative, 


for if so it can impart little import. 

William Hazlitt (1778-1830) was no mean essayist’, and his writing is 
as pleasurable and profitable to read today as it was, I am sure, more 
than a century ago. Among his essays that entitled On wit and humour of 
1818 carries a lengthy quotation on a similar subject from Isaac Barrow’s 
Sermons, from which the passage at the head of this chapter is taken. 
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Now the beauty of many a quotation lies not only in the language in 
which it is expressed, nor even in its appositeness in a particular context, 
but also in its possible applicability to a number of different situations. 
Thus it was that, on reading Hazlitt, I was struck by the relevance of 
Barrow’s quotation to probability. 

“All human knowledge,” said Russell, “is uncertain, inexact, and partial” 
[1948, p. 527]. If this be true (or at least as true as it can be, if it is self- 
referring), then the study of probability is of fundamental importance in 
the examination of scientific theories. So much has been written on the 
nature, interpretation and applicability of probability, that to add here to 
opinions on these matters would merely result in the heaping of Ossa upon 
Pelion. One particular aspect of this concept, however, has come to play a 
particularly important part in scientific inference, and it is to this notion, 
that of inverse probability, that this work is devoted; but before turning to 
this topic I would like to say something about inverse problems in general. 


1.2. Inverse problems 


It has been suggested (see Grandy [1985, p. 2]) that in life one is continually 
confronted with inverse problems®; and while this is probably true, we shall 
limit the discussion here to matters less ontological in nature. 

The phrase “inverse problems” is sometimes used in a rather restrictive 
sense, being interpreted as “inverse problems in mathematical physics” (see 
Romanov [1974, p. 1]). Here the aim is the determination of the coefficients 
of differential equations, ordinary or partial, using the known functionals 
of the solution. The problem is inverse to the “direct” problem in which 
solutions are found to given equations under specified boundary or initial 
conditions. Bertero gives the two problems as follows: 


The problem which consists in the determination of the map- 
ping from the set of all possible objects into the set of all pos- 
sible data is usually called the direct problem. ... 

the inverse problem is the determination of the object f from 
the measured data g. It corresponds to the inversion of the di- 
rect mapping ... [1986, pp. 52, 53] 


The description of inverse problems in geophysics is well-put by Barcilon 
as follows: 


Relying on well-understood physical laws, geophysicists have 
traditionally looked upon the Earth as a black box which pro- 
duces measurable outputs to various naturally applied inputs. 
Their task has been to infer the properties of the black box from 
measurements of these inputs. [1986, p. 1] 
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Miodek, however, suggests in another context, and perhaps with tongue in 
cheek, that 


The inverse scattering problem is the inverse of a direct scat- 
tering problem which is of course called direct because it was 


studied first. [1978, p. 298.] 


Converting the real physical problem into a mathematical one, Jackson 
[1978] formulates the problem as 


y= f(ze) +e, 


where y represents the experimental data, x, represents a set of unknown 
parameters, f is an operation describing the theoretical values of the data, 
and e denotes all effects not explicitly modelled (a vector of errors). Three 
different usages of “inverse problems” are then mentioned, viz. 


1. the exact inverse problem, where one’s aim is to find an operator h 
that exactly inverts the operator f, 


2. the optimal inverse problem, in which an estimate z,, 1s sought that 
minimizes some objective function Q(x), and 


3. the complete inverse problem, in which one attempts to find all pos- 
sible solutions satisfying the pertinent constraints, and, in addition, 
to find one solution that agrees as closely as possible with the data. 


Fertile fields for the labours of those interested in inverse problems are 
provided by many natural and physical sciences, for example geophysics 
(seismology, the inverse kinetic problem, the study of the Earth’s internal 
structure, potential theory, the determination of the hypocentres of earth- 
quakes), quantum mechanics (the inverse Sturm-Liouville problem), partial 
differential equations (the spectral inverse problem for Schrodinger’s equa- 
tion), medical diagnostics, atmospheric sounding, radar and sonar target 
estimation, radio-astronomy, microscopy, wave propagation, X-ray medical 
imaging, and statistics‘. 

When we come to the mathematical sciences we find de Morgan writing 


Every mathematical method has its inverse, as truly, and for 
the same reason, as it is impossible to make a road from one 
town to another, without at the same time making one from 
the second to the first. The combinatorial analysis is analysis 
by means of combinations; the calculus of generating functions 
is combination by means of analysis. [1842, p. 337] 


As an example of a general inverse problem in mathematics consider that 
posed by the linear Fredholm integral equation of the first kind 


eet a K(x, y)U(y) dy , (1) 
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whose solution U is to be found for a given class of functions u on a given 
interval. This equation, according to Grandy, 


encompasses many mathematical inverse problems, from the 
inversion of integral transforms to the use of Cauchy’s theorem 
for determining a function in a region from its values on the 
bounding contour. [1985, p. 3] 


An inverse problem that is perhaps more probabilistic in nature concerns 
the determination of a function f € L?7(0,1) when a finite set of moments 


1 
bn = | 2°" Ff(r)dz, ne€{1,2,...,N} 
0 


is given (the moment problem’). 

A further connexion with statistics is provided by the “error calculus”; 
suppose, for example, that the measurement of P, V and T leads to the 
determination of R such that the Boyle-Mariotte law PV = RT holds. The 
inverse problem now reduces to the determination of the range of errors 
for R. That is, inverse problems specify a way of passing from data to 
parameters®. 

In a study of Bayesian inversion of seismic data Duijndam states that 
since uncertainty always inheres in a practical inverse problem, the appro- 
priate formulation of such a problem should take place within probability 
theory. While this statement provides a suitable link for the passage from 
inverse problems in general to inverse probability, one must bear in mind 
that not all authors would agree that probability is the only (or even an 
appropriate) vehicle for the conveying of uncertainty ‘. 

The nascence of inverse problems in probability was by no means a 
speedy process. The actual term inverse probability was, as we shall see 
in Chapter 8, first used in English by Augustus de Morgan in the 1830’s. It 
is interesting, and important (as Edwards [1997] has sagaciously stressed), 
to note that while de Morgan referred to the “inverse method”, earlier writ- 
ers had stressed the problem rather than the method: thus, as we shall see, 
Hartley [1749] wrote of “a Solution of the inverse Problem”, while Price, 
in his introduction to Bayes’s Essay, mentioned “the converse problem”. 

In his article of 1843 in the Encyclopedia Metropolitana de Morgan pre- 
sented two inverse principles (discussed in §8.4 of the present work) and 
stated, after the second of these, that “[it] is not in reality different from the 
one first stated” (that first principle, in turn, being nothing more than the 
giving of a probability as the ratio of the number of favourable cases to the 
total number of cases when all events are equally probable). Francis Ysidro 
Edgeworth [1911, §13, Note 10] referred to J. Cook Wilson’s exhibiting of 
the “essential symmetry” of these two methods in 1900, though the latter 
actually claimed to do no more than provide a proof of the discrete Bayes’s 
Formula as rigorous as those of “ordinary” probability. 
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Basic to the definition of inverse probability is Bayes’s ‘Theorem, though 
as I shall suggest in Chapter 5, one might well view Joseph Louis de la 
Grange as the first to use Bayes’s result in a statistical setting. The first to 
give a precise formulation of inverse probability, however, was Pierre-Simon 
Laplace, who, in his memoir Sur la probabihté des causes of 1774, gave an 
exact formulation of the problem (Grandy [1985, p. 11] in fact considers 
Laplace to have been “the first to formulate the inverse problem in a careful 
scientific context” (emphasis added)). Laplace’s result can be phrased as 
follows: suppose that an event EF of positive probability can be produced 
by any one of a number of mutually exclusive and exhaustive causes C; 
each of positive probability. Then for each 2 


Pr[C;|£] = Pr[E|C;] Pr{C;] /> Pr[E|C5] Pr[C;] : 


J 


What is it that makes this a problem in inverse probability, as opposed to 
one in direct probability? 


1.3. Inverse probability 


In this book the study of inverse probability is begun with the seminal paper 
An essay towards solving a problem in the doctrine of chances by Thomas 
Bayes, posthumously published in 1764. Although the first explicit proof 
of the major result in inverse probability is to be found here, some attempt 
at the proof of such a result had been made before Bayes’s successful foray, 
and it might not be taken amiss if we explore these hesitant, if unsuccessful, 
early attempts here®. 

In 1713 Jakob Bernoulli’s Ars Conjectandi appeared. In the fourth part 
of this posthumously published work Bernoulli notes that sometimes (usu- 
ally in games of chance) knowledge of the numbers of cases involved is 
sufficient to determine probabilities’: 


But here it seems to me that we are at a loss, since one is at 
liberty to do this only just in very few cases, and indeed one 
may hardly succeed elsewhere other than in games of chance, 
the first inventors of which, doing their best to bring about 
fairness, arranged things for themselves in such a way that the 
numbers of cases in which gain or loss ought to follow, might be 
definite and known, and that all these cases might happen with 
equal facility. For in most other situations depending either on 
the working of nature or on the judgement of men, this is by 
no means the case. [p. 223] 


He further contrasts the ease of obtaining numbers in games of chance 
with the difficulty (Gf not the impossibility) of determining the number of 
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diseases that might afflict the human body, and states that in such a case! 


Verily to be sure, another way is open to us here, by which we 
may obtain that which is sought; & what it is not granted to 
find out @ priori, it will at any rate be permitted to extract a 
postertort, that is, from a result perceived many times in similar 
instances; since it ought to be assumed that every single thing 
is able to happen and not to happen in future in as many cases 
as it will have been observed formerly in similar circumstances 
to have occurred and not to have occurred. [p. 224] 


The problem is by no means an easy one, as Bernoulli points out!!: 


This therefore is that problem, which I have proposed worthy of 
being published in this place, after I have suppressed it till now 
for twenty years, and of which not only the novelty, not only 
the very great utility, but also the concomitant difficulty, is able 
to superadd weight and worth to all the remaining chapters of 
this doctrine. [p. 227] 


The major part of the solution is!? 


Therefore let the number of fruitful [successful] cases to the 
number of unfruitful [unsuccessful] cases be either exactly or 
approximately in the ratio r/s, and to the same degree to the 
total number in the ratio r/(r +s) or r/t, the limits (r + 1)/t 
& (r —1)/t determine [restrict] this ratio. Our task is to show, 
that one may run so many trials, that, given as many times as 
you like (say c), it emerges as more likely that the number of 
successful cases will fall within rather than outside these limits, 
h. e. the number of successful to the number of all observations 
will have a ratio neither greater that (r + 1)/t, nor less than 


(r — 1)/t, [p. 236] 


a result that one could today phrase as follows: for give c one can find 
n=ng+np such that 


Prilng/n — r/t| < 1/t]: Pri[ns/n—r/t| > 1/t)::¢:1. 


It should be noted that p = r/t is the probability of a success; and while p 
is often taken to be a population frequency (which is probably a fair con- 
clusion to draw from the statement of the theorem), this is not explicitly 
stated!>. 

It has been suggested, I think ill-advisedly, that Bernoulli himself pro- 
posed an inverse use of this theorem. For instance, the argument given in 
Todhunter [1865, art. 125] runs as follows!*: suppose that an urn contains 
white and black balls in an unknown ratio, and suppose that A+ S draws 
from this urn result in R white and S black balls. Then the ratio of white 


1.3 Inverse probability 7 


to black balls should (according to Bernoulli, claims Todhunter) be taken 
as approximately R: S. 


Now after the argument detailed above Bernoulli continues!® 


Whence finally this singular result is seen to follow, that if ob- 
servations of all events were to be continued through all eternity 
(the probability finally ending in complete certainty) all hap- 
penings in the world would be observed to occur in fixed [defi- 
nite] ratios and according to a constant law of change; to such 
a degree that even in the most accidental and fortuitous hap- 
penings we would be bound to recognize [acknowledge] a sort 
of inevitability as it were and, so to say, a necessity ordained 
by fate. [p. 239] 


No further discussion of this point is forthcoming, and while one may per- 
haps deduce the intent to apply the theorem in an inverse manner, I doubt 
that the presence of an explicit result can be found?®. 

On Bernoulli’s death, therefore, one was left with a careful proof of the 
direct theorem and perhaps a hint at the inverse result. 

Some forty years after Bernoulli’s death in 1705 David Hartley published 
his Observations on Man, His Frame, His Duty, And His Expectations. 
Here a passage that has bearing on whether or not the result that today 
(sometimes) bears the name “Bayes’s Theorem” is correctly named may be 
found’. It runs as follows: 


Mr. de Moivre has shewn, that where the Causes of the Hap- 
pening of an Event bear a fixed Ratio to those of its Failure, 
the Happenings must bear nearly the same Ratio to the Fail- 
ures, if the Number of Trials be sufficient; and that the last 
Ratio approaches to the first indefinitely, as the Number of Tri- 
als increases ... An ingenious Friend has communicated to me 
a Solution of the inverse Problem, in which he has shewn what 
the Expectation is, when an Event has happened p times, and 
failed q times, that the original Ratio of the Causes for the 
Happening or Failing of an Event should deviate in any given 
Degree from that of p to gq. And it appears from this Solution, 
that where the Number of Trials is very great, the Deviation 
must be inconsiderable: Which shews that we may hope to de- 
termine the Proportions, and, by degrees, the whole Nature, of 
unknown Causes, by a sufficient Observation of their Effects. 


[pp. 338-339] 


The first part of this passage refers to Bernoulli’s Theorem as generalized 
by de Moivre (as we shall see later), while the second gives a clear statement 
of an inverse result. If we replace p and q by ng (the number of successes 
or “happenings”) and np (the number of failures) respectively, then the 
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communication of Hartley’s “ingenious Friend” can be written!® 
E||\ns/np — ps/pr| =e|ns,nr] (Ve > 0), 


where ps and pr denote the causes for the success or failure. Notice that 
the numbers ng and np, as well as the “given Degree” (as measured by ¢), 
are distinctly stated to be known, while the initial ratio of (the numbers 
or probabilities of) causes is unknown. Incidentally, it is not known who 
Hartley’s “ingenious Friend” was; for various suggestions see Dale [1988b], 
Edwards [1986] and Stigler [1983]. 

De Moivre’s Doctrine of Chances first appeared in 1718; the second and 
third editions of 1738 and 1756 carried two passages in which de Moivre 
argued from frequencies to probabilities. The first of these is to be found 
in a corollary to Problem LXXII of the third edition: 


if after taking a great number of Experiments, it should be 
observed that the happenings or failings of an Event have been 
very near a ratio of Equality, it may safely be concluded, that 
the Probabilities of its happening or failing at any one time 
assigned are very near equal. [1756, pp. 240-241] 


Problem LXXIII contains a generalization of this result, and is followed by 
a corollary stating 


if after taking a great number of Experiments, it should be 
perceived that the happenings and failings have been nearly 
in a certain proportion, such as of 2 to 1, it may safely be 
concluded that the Probabilities of happening or failing at any 
one time assigned will be very near in that proportion, and 
that the greater the number of Experiments has been, so much 
nearer the ‘Truth will the conjectures be that are derived from 
them. [1756, p. 242] 


To this edition of The Doctrine of Chances de Moivre attached a transla- 
tion of his 1733 pamphlet Approzimatio ad Summam Terminorum Binomu 
a+b in Seriem expansi, in which he considered “the hardest Problem 
that can be proposed on the Subject of Chance” [1756, p. 242] — a prob- 
lem that is essentially the inverse of Bernoulli’s Theorem. In this pamphlet 
de Moivre establishes a number of results providing limits for deviations of 
given probabilities from observed numbers of occurrences. One such result 
yields 


Pr[|ns/n — ps| < l/n|ps,n] , 


where / is a given number, and this seems to be the result that Hartley 
attributes to de Moivre. 

In a remark following the translation of the Approzimatio de Moivre 
presents what is esentially an inverse argument, viz. 
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As, upon the Supposition of a certain determinate Law accord- 
ing to which any Event is to happen, we demonstrate that the 
Ratio of Happenings will continually approach to that Law, as 
the Experiments or Observations are multiplied; so, conversely, 
if from numberless Observations we find the Ratio of the Events 
to converge to a determinate quantity, as to the Ratio of P to 
@; then we conclude that this Ratio expresses the determinate 
Law according to which the Event is to happen. [1756, p. 251] 


There is, I believe, a distinction to be preserved between the inversion of 
Bernoulli’s Theorem and Bayes’s Theorem!?. Under the former umbrella 
we include results essentially advocating the estimation of an unknown 
probability p by an observed frequency z/n in a large number of trials, the 
approximation being effected by consideration of 


Pr{|z/n— p| <e]. 


While this is certainly similar to the guaesitum in Bayes’s Theorem, the 
endpoints of the interval (p;, p2), in which p is constrained to lie, appearing 
in that result are not necessarily functions of x and n. I believe that Hart- 
ley’s result 1s more in keeping with the inverse Bernoulli Theorem than with 
Bayes’s Theorem, and that while both Bernoulli and de Moivre gave sound 
arguments for the inference from known probabilities to observed frequen- 
cles, their attempts at results in the opposite direction were ill-expressed. 
Laplace later repeated Bayes’s result (though probably in ignorance of his 
clerical predecessor’s work) and also gave a proof of Bernoulli’s Theorem 
from which a converse result was deduced. Thus even if the idea of an ar- 
gument in inverse probability was not original to Bayes, the method to be 
employed owes much, if not indeed everything, to his labours. 

After these considerations by Hartley and de Moivre the next to tackle 
the problem was Thomas Bayes. As his work will be examined in detail in 
subsequent chapters, it will be sufficient merely to sketch some pertinent 
points here. 

The problem with which Bayes is concerned is the following: 


Given the number of times in which an unknown event has 
happened and failed: Required the chance that the probability 
of its happening in a single trial lies somewhere between any 
two degrees of probability that can be named. [p. 376] 


To solve this Bayes requires a postulate given at the beginning of the second 
part of his Essay?°: 


1. I Suppose the square table or plane ABCD to be so made 
and levelled, that if either of the balls o or W be thrown upon 
it, there shall be the same probability that it rests upon any one 
equal part of the plane as another, and that it must necessarily 
rest somewhere upon it. 
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2. I suppose that the ball W shall be lst thrown, and through 
the point where it rests a line os shall be drawn parallel to AD, 
and meeting CD and AB in s and o; and that afterwards the 
ball O shall be thrown p+q or n times, and that its resting 
between AD and os after a single throw be called the happening 
of the event M in a single trial. [p. 385] 


As Edwards [1978] has indicated, the assumption that the table is uniform 
is unnecessary. Moreover, it should be noted that Bayes’s fundamental as- 
sumption was that the number of successes had a discrete equiprobable 
distribution (see Good [1979]). 

The main result, given as Proposition 10 in the E'ssay, may be stated as 
follows: let x be the prior probability of an unknown event A. Then 


Priz, < x < Z2|A has happened p times and failed g times in p+ q trials] 


=[- (?**) 2c P ~ajtde | [ (7) era n)! de. 


The next major step was taken by Richard Price, communicator of 
Bayes’s paper to the Royal Society on the latter’s death, who added an 
Appendix to Bayes’s Essay in which application of Bayes’s results was 
made to future events. The following will serve as an example: let M be an 
event concerning whose probability x nothing is known, antecedent to any 
trials. Then, by Bayes’s result, 


Pri[(1/2) < « < 1|M has occurred once] = 3/4. 


Next, 


Let us first suppose, of such an event as that called M in the 
essay, or an event about the probability of which, antecedently 
to trials, we know nothing, that it has happened once, and that 
it is enquired what conclusion we may draw from hence with 
respect to the probability of it’s happening on a second trial. 
The answer is that there would be an odds of three to one for 
somewhat more than an even chance that it would happen on 
a second trial. [p. 405] 


But how does all this fit in with inverse probability? ‘To answer this 
question the following example may be useful: suppose that a number of 
diseases D,, Do,..., Dm can be “associated” with a number of symptoms?! 
S1,52,..-,5n. A patient exhibiting symptom S$; visits his doctor. Now 
Pr[S;|D,;], the probability that a patient with disease D; will manifest 
symptom S;, is presumably known. This is a direct probability (the disease 
causes the symptom). The object of interest is Pr[D;|S;], the probability 
that the patient with symptom 5S; has disease D;. This is an inverse prob- 
ability (the symptom does not cause the disease). 
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The place of inverse probability in statistics has been well-summarized 
by Bowley, who, in referring to previous chapters in his book, wrote 


the problems of the errors that arise in the process of sampling 
have been chiefly discussed from the point of view of the uni- 
verse, not of the sample; that is, the question has been how far 
will a sample represent a given universe? The practical question 
-is, however, the converse: what can we infer about a universe 
from a given sample? This involves the difficult and elusive the- 
ory of inverse probability, for it may be put in the form, which 
of the various universes from which the sample may a priori 
have been drawn may be expected to have yielded that sam- 


ple? {1926, p. 409] 


One can thus view the obtaining of an inverse probability in a crude 
way as the finding of the probability of a cause from the occurrence (or 
observation) of an effect — and as Virgil [Georg. ii. 490] said??, 


Felix qui potuit rerum cognoscere causam. 


With goodwill one can then see Bayes’s original problem as the obtaining of 
the probability that a value z lies in a certain interval (the cause) given the 
result of an experiment (the effect), while the second of Price’s examples 
cited above (one that relies on Bayes’s Theorem for its solution) gives the 
probability of a further observation given certain data. 

To explore a bit further the connexion between Bernoulli’s Theorem, its 
inverse and Bayes’s Theorem?’, consider a binary experiment with constant 
probability p of success, and suppose that n independent trials have been 
run. If S, is arandom variable denoting the number of successes obtained, 
then 


Pris, = sinp| = (")e" Spy? . eae bo Dancap (2) 
Bernoulli’s Theorem then declares that, as n — oo, the observed frequency 
f = s/n of successes tends to p in the sense that 

(Ve>0) Prilf—pl<el—ol. 


However this result does not say how large n must be to reach any specified 
accuracy. The answer to this problem is given by the de Moivre-Laplace 
limit theorem (see Feller [1957, §VII.2]), from which one finds that 


1/2 y) 

n n(f — p) 

Prl{df|p, n| ~ Loos] ex | df . 3 

Hnl~ Lae —p)| P| ap ay) my 

Bayes’s Theorem, on the other hand, under the conditions detailed above, 
leads to the beta-distribution 


B(p) = Pr{dp|s, n] = (n+ 1) (") "a — py*dp. (4) 
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Now In B(p) is maximized by p = s/n = f, and on expanding In B(p) in a 
Taylor series about p we get 


In B(p) ~ {In[(n + 1)!/s!(n — s)!]+ sln f + (n — s)ln(1 — f)} 
7 | ef)’ 
+| eal _ 


whence 


B(p) ~ ~(n+)(" )ra-s sytem |- Tee 5) 


On replacing the factorials by their approximations given by the Stirling-de 
Moivre formula, and on using the facts that, for large n, 


(n + L)nti+1/2 we noti+1/2 


one gets 


Se ee ed 


Comparison of (5) with (3) shows the symmetry between the probability 
of p given f and of f given p, (see Jaynes [1979, p. 19]), and gives a clear 
solution of Bernoulli’s inversion problem in the Normal case. 

When we come to Laplace’s simple expression 


Pr[C;|Z] = Pr[E|Cy] 1/3 E|C;], (6) 


we find that Pr[#|C;] corresponds to (Bernoulli’s) binomial distribution (2), 
while in the limit as n — oo (6) corresponds to Bayes’s beta-distribution. 
Note that Jaynes [1979, p. 20] considers Laplace’s general result 


Pr[E|C;] Pr[C;| A] 


PrlCill AH) = Sion PriCy A 


(7) 
(where H denotes prior information) as the correct and unique solution to 
the inversion problem. 

It might be noted that inverse probability and inverse problems are differ- 
ent in nature. To see this, write (7) in the form of a posterior distribution, 


n(0)f(el0) s 


m(O|z) = 
[oreo dé 
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While the inverse problem (1) requires the finding of the term U(-) in the 
integrand, no similar finding of z(-) is even hinted at in the consideration 
of (8). 

The principle of inverse probability is an easy consequence of the product 
tule for probabilities, and it is framed as follows by Jeffreys [1973, §2.3]: 
let p represent the initial data, let @ be a set of additional data, and let 
1,---;4n be a set of hypotheses. Then?* 


Priq-l@Ap) ss 
Pr[q-|p] Pr[@lar Ap] = Pr[A|p] 


is the same for all g,. This is perhaps a slightly unusual form: however 
Jeffreys then goes on to note that, under the assumption that the q, are 
mutually exclusive and exhaustive, the above result becomes 


_ Pr{gr|p] Pr[@ lar A P| 
Prlarl@ A pl = > Prlgr |p] Pr[Olar A p] 


The same principle is given in a slightly different form in Jeffreys [1961] as 
Pr(qr|0 A p| x Prig,|9] Prip|gr A 8] , 


or 
posterior « prior x likelihood . 


(One might note here that Perks [1947] remarks that while Bayes’s ‘Theo- 
rem is indisputable on any theory of probability when the prior probabilities 
are known, controversy arises as soon as the priors are unknown??.) 

While most of those who accept the doctrine of inverse probability would, 
I believe, agree in the main, if not completely, with the preceding discussion, 
there are those whose interpretations are somewhat different: Lancaster 
(1994, p. 206], for example, divides the probability calculus into two parts 
— the direct and the inverse — and states that the latter is fundamentally 
based on the law of large numbers (or more precisely on the central limit 
theorem), the probability of an event being deducible from data and the 
error estimated. 

Exploring the connexion between probability and information, with spe- 
cial reference to the biological sciences, Wrighton [1973] avers that the 
inverse problem in probability may be formulated in two ways. The first 
of these, the analytical formulation, is typified by the drawing of balls at 
random from an urn according to a specified sampling procedure. Our aim 
is to find observed properties of the sample that allow the drawing of infer- 
ences about the true contents of the urn. As an example of the second, the 
prospective formulation, consider the same urn of unknown composition as 
before. Our aim is now to determine a plan to be drawn up for the em- 
ployment of a given sampling scheme, when we want to find out something 
about the urn’s contents. Bayes’s inverse method is seen by Wrighton as 
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an example of the analytical method: the prospective formulation is seen 
in the inverse form of Bernoulli’s Theorem. Further, problems in inverse 
probability should be seen from the prospective rather than the analytic 
point of view, prior probabilities not being ascribed to the possible con- 
tents of the urn, but our concen being rather with the prior specification of 
possible constitutions. Wrighton notes with amazement [p. 37] that little 
cognisance seems to have been taken even of the possible existence of any 
alternative to the analytical approach before the 1930’s. Given the scope 
of the present work, it will therefore not be surprising that our emphasis 
here is not on the prospective formulation. 

In a sense all statistical inference is based on the idea of inversion. In- 
deed, Chuaqui [1991] establishes two principles in his discussion of decision 
theory and statistical inference, the second of which, the Inverse Infer- 
ence Principle, is of particular use in arguing from evidence (or knowledge 
of the occurrence of an event) to hypotheses. This principle is concerned 
with rules for the rejection and acceptance of hypotheses, and is viewed by 
Chuaqui as the fundamental way in which our degrees of belief are changed. 
Despite his concentration on these two principles, however, Chuaqui does 
state that Bayes’s formula may be seen as a form of inverse inference. 

Consideration of the probabilistic nature of a model and the effect of 
random factors conduces to the obtaining of information about the deriva- 
tion of effects from causes, and Bayes’s Theorem is ideally suited to the 
examination of such an inversion”® (recall our earlier remarks on inverse 
methods versus inverse problems). Of course this result plays a significant 
part not in subjective theories of probability alone, where its role in the 
updating and improving of one’s prior opinions and beliefs is paramount: it 
appears in classical statistics, though perhaps more often here as a “mere” 
theorem, and also enters into objective (or necessary or logical) theories, 
in which the prior is supposed to be uniquely determined by some formula. 
The prior distribution being regarded as a posterior distribution obtained 
after the acquiring of the prior information, Bayes’s Theorem may be used 
in what might be described as a reverse direction to argue back by deducing 
the prior from the posterior and thus to reach a state of no information?’. 
Indeed, one cannot but agree with Jeffreys that 


The fundamental problem of scientific progress, and a funda- 
mental one of everyday life, is that of learning from experience. 
[1961, p. 1] 


Estimation is an important part of statistical inference, and a major role 
is played in that topic by the Method of Maximum Likelihood. The con- 
nexion between this method and inverse probability seems cloudy, however, 
and conflicting opinions are to be found in print. Thus Edwards writes 


the Method of Maximum Likelihood is analytically identical to 
the method of inverse probability if a uniform prior distribution 
is adopted [1972, pp. 97-98], 
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while Hartigan says “A ... non-Bayesian method is maximum likelihood” 
[1983, p. 91], astatement that seems to clash with Edwards’s when the im- 
portance of inverse probability in Bayesian methods is recalled. Hartigan 
also notes (op. cit., p. 116) that under regularity conditions of asymptotic 
normality, Bayesian and maximum likelihood intervals coincide, though 
Good [1965, p. 16] cautions against the magnetic lures of maximum likeli- 
hood, noting that its asymptotic properties are no better than those exhib- 
ited by Bayesian methods. (Earlier on in his monograph Good noted “the 
inconsistency of maximum-likelihood estimation with a Bayesian philoso- 
phy” [p. 4].) That great exponent of Bayesian methods, Harold Jeffreys, 
was similarly lukewarm in his recommendations to users of maximum like- 
lhhood, writing 


in the great bulk of cases its results are indistinguishable from 
those given by the principle of inverse probability, which sup- 
plies a justification of it. [1961, p. 194] 


A major step was taken by Strasser [1981], who showed that every set of 
conditions implying consistency of the maximum likelihood method also 
implies consistency of Bayes estimates for a large class of priors. 

Care must of course be taken not to confuse the method of maximum 
likelihood with likelihood pure and simple (though, as Oscar Wilde said of 
Truth, likelihood “is rarely pure and never simple”). The latter, developed 
by R.A. Fisher, is offered as an alternative mode of inference for those 
unhappy with both inverse probability and significance tests. The main 
building-block here is L[H|R], the likelihood of the hypothesis H given R, 
a quantity which, having H as variable and FA as constant, is defined as 
being proportional to Pr[R|H], the probability of (the variable) R given 
(the hypothesis) H. Put somewhat rudely, the likelihood is what remains 
of Bayes’s Theorem once the prior is removed from the discussion?®. The 
philosophy and use of likelihood has been vigorously expounded by Ed- 
wards [1972]. 

The curiously taut (and sometimes similarly taught) relationship be- 
tween likelihood and inverse probability, as well as the precept and practice 
of some of their proponents, have been summarised by Jeffreys as follows: 


Pearson, in his last paper, violently attacked Fisher’s methods, 
and Fisher replied. The odd thing was that Fisher’s likelihood 
method, in the case considered, is completely justified by in- 
verse probability, which was used in the Grammar of Science, 
whereas Fisher paid no attention whatever to our justification. 
Pearson’s method ignores likelihood altogether, calculating a 
number of functions of the observed values chosen for no gen- 
eral rule and usually highly correlated. That is, Fisher used a 
method that followed from Pearson’s principles, while Pearson 
himself objected to it. [1974, p. 2] 
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In a recent study of Fisher’s early use of the phrase “inverse probability” , 
Edwards concluded 


that in 1912 by znverse probability Fisher meant likelihood; that 
in 1916 by the principle of inverse probability he meant the 
Laplace-de Morgan principle which he thought conferred legit- 
imacy on the method of maximum likelihood; and that only as 
late as 1921-1922 did he fully appreciate that this principle was 
inescapably Bayesian and had to be rejected. [1993, p. 11] 


The importance of Bayes’s Theorem in subjective theories has led to the 
development of what is generally called Bayesian Statistics, though whether 
Bayes was himself a Bayesian is moot. It is perhaps of some small interest 
to note that the question whether Mr X, now perceived as the founder of 
a school of thought that has become known as “X-ianism” or “X-ianity” , 
was in fact himself an X-ian, is one that is often asked. Gillies posed the 
question in regard to Bayes in 1987, de Morgan having written in similar 
vein in 1855 “The question whether Copernicus himself was a Copernican 
in the modern sense of the word is not easily settled” [pp. 6-7]. (This last 
quotation is in fact not as irrelevant to our theme as it might at first appear 
to be, for the introduction of inverse probability occasioned a revolution as 
important to statistical thinking as the work of the great Prussian was to 
astronomy — and as potentially embarrassing to conventional thinking.) 

In his Gresham Lecture in 1893 Whitworth noted that an “eminent 
professor*’” had described the whole theory of inverse probability as “a 
delusion and a snare” [1897/1945, p. xix] (at least he spared us Thomas, 
Lord Denman’s further term: “a mockery”), and he himself said elsewhere 
that 


The term “Inverse Probability” appears to be unnecessary and 
misleading. [1901/1942, p. 184] 


Had we believed these words this book would not have been written. It 
is our hope that the recollection and examination of the origins and early 
development of inverse probability will show both its necessity and its rdle 
as a trustworthy guide in scientific inference today. 


Thomas Bayes: a 
biographical sketch 


If those whose names we rescue from 
oblivion could be consulted they might tell 
us they would prefer to remain unknown. 


Matthew Whiteford. 


Most authors of papers or articles devoted to biographical comments on 
Thomas Bayes preface their remarks with an Apologia for the paucity of 
pertinent particulars. In 1860 we find de Morgan publishing a request in 
Notes and Queries for more information on Bayes, listing, in no more than a 
few paragraphs, all that he knows. In 1974 Maistrov, in what was probably 
to that date the most complete and authoritative! history of probability 
theory since Todhunter’s classic of 1865, bemoaned the fact that 


biographical data concerning Bayes is scarce and often mislead- 
ing... Even in the “Great Soviet Encyclopedia” (BSE) there is 
no mention of his birthdate and the date of his death is given 
incorrectly as 1763. [pp. 87-88] 


But no national shame need be felt by the Soviets on this account: the 
Dictionary of National Biography (ed. L. Stephen), though devoting space 
to Thomas’s father, is stubbornly silent on the perhaps more illustrious 
son”, while the Encyclopedia Britannica has apparently? no entry under 
“Bayes” until the fourteenth edition, post 1958, where a brief biographical 
note may be found. The only earlier work of general reference to contain 
a biographical note on Thomas Bayes, as far as has been ascertained, is 
J.F. Waller’s edition of the Imperial Dictionary of Universal Biography? of 
1865. 

The information conveyed in the present work is, unfortunately, almost 
as exiguous: indeed, for one whose work has come to play such an important 
role in modern statistical theory and practice (and hence in modern science 
in general), Thomas Bayes has been singularly successful in preserving a 
large measure of personal (and public) privacy. 
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Thomas, the eldest child of Joshua and Ann Bayes, was born in 1701 
or 1702 (the latter date seems generally favoured, but the present epitaph 
in the Bunhill Fields Burial Ground, by Moorgate, merely gives his age at 
death®, in April 1761, as 59). The place of his birth is subject to similar 
uncertainty: the received preference seems to be for London®, but Holland 
surmises that “his birthplace was in Hertfordshire” [1962, p. 451]. As luck 
would have it, however, the parish registers of Bovingdon, Hemel Hemp- 
stead, Herts. (where Joshua is supposed to have ministered at Box Lane’) 
for 1700-1706 have apparently gone astray. 

Of Thomas Bayes’s early childhood little is known. While some sources 
assert that he was “privately educated”, others? believe he “received a lib- 
eral education for the ministry”: the two views are perhaps not altogether 
incompatible. Some light can perhaps be shed on the question of Thomas’s 
schooling from the existence of a Latin letter to him from John Ward, a let- 
ter dated 10. kal. Maii 1720 and distinctly schoolmasterish in its advocation 
of the importance of the cultivation of style in writing. John Ward (1679?- 
1758), the son of the dissenting minister John Ward, was, according to the 
Dictionary of National Biography, a clerk in the navy office until leaving 
it in 1710 to open a school in Tenter Alley, Moorfields. The Imperial Dic- 
tionary of Universal Biography is perhaps more careful in stating merely 
that Ward, in 1710, “exchanged his clerkship for the post of a schoolmaster 
in Tenter Alley”. Ward was elected a fellow of the Royal Society on 30th 
November 1723 and, on his death, was interred in Bunhill Fields. 

John Eames was assistant tutor in classics and science at the Fund 
Academy!® in Tenter Alley, succeeding Thomas Ridgeley as theological 
tutor on the latter’s death in 1734. It is indeed tempting to suppose that 
Thomas Bayes was a pupil at the school at which both Eames and Ward 
taught, but this is mere conjecture (see Appendix 2.2 for further discussion 
of this matter). In fact, Bayes’s name does not appear in a still extant list 
of Ward’s students. 

What Thomas could have studied at the Fund Academy is uncertain, the 
Latin letter referred to above merely indicating the importance Ward at- 
tached to the classics and the mathematical sciences (“mathesi” )''. Where 
he could have picked up his knowledge of probability is unknown: there 
is, to our mind, little evidence supporting Barnard’s theory that he might 
have had some contact with “poor de Moivre”!?, at that time eking out a 
precarious existence by teaching mathematics at Slaughter’s Coffee House 
in St Martin’s Lane!®, or, according to Pearson [1978] 
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sitting daily in Slaughter’s Coffee House in Long Acre, at the 
beck and call of gamblers, who paid him a small sum for cal- 
culating odds, and of underwriters and annuity brokers who 
wished their values reckoned. [p. 143] 


There is, however, more evidence for Holland’s [1962, p. 453] tentative 
suggestion that he might, after all, have been educated further afield, as 


2 Thomas Bayes 19 


recent research has disclosed!*. For in a catalogue of manuscripts in the 
Edinburgh University Library the following entry may be found: 


Edinburgi Decimo-nono Februarij Admissi sunt hi duo Juvenes 
praes. D. Jacobo. Gregorio Math. P. Thomas Bayes. Anglus. 
John Horsley. Anglus. 


The year of admission is 1719. The entries in this manuscript bear the 
signatures of those admitted: that of Bayes is markedly similar to the one 
found in the records of the Royal Society. 

Bayes’s name also appears in the Matriculation Album of Edinburgh 
University under the heading 


Discipuli Domini Colini Drummond qui vigesimo-septimo die 
Februarii, MDCCXIX subscripserunt 


and further evidence of his presence may be found in the List of Theologues 
in the College of Edinburgh since October 1711 (the date is obscure), in 
which Thomas’s entry to both the College and the profession is given as 
1720. He is stated as being recommended by “Mr Bayes”, presumably his 
father Joshua. What are possibly class lists give Thomas’s name in the fifth 
section in both 1720 and 1721. In a further list, this time of the prescribed 
theological exercises to be delivered, we find Bayes mentioned twice: on 
14th January 1721 he was to deliver the homily on Matthew 7, vs 24-27, 
and on 20th January 1722 he was to take the same role, the text in this 
case being Matthew 11, vs 29-30. Finally, he is mentioned in the list of 
theological students in the University of Edinburgh, from November 1709 
onwards, as having been licensed, but not ordained. A full list of references 
to Bayes in the records of that University is given in Appendix 2.4. 

It is perhaps hardly surprising that Thomas, coming as he did from a 
family strong in nonconformity, should have sought ordination as a non- 
conformist minister. When this ordination took place we do not know: the 
only thing we know with some degree of certainty is that it must have been 
during or before 1727; for in Dr John Evans’s (1767-1827) list of “Ap- 
proved Ministers of the Presbyterian Denomination” for that year we find 
Thomas’s name!°. We suspect also that Thomas had assisted his father 
at Leather Lane for some years!® from 1728 before succeeding!’ the Rev. 
John Archer as minister at the meeting-house, Little Mount Sion?8, in Tun- 
bridge Wells!®. Whiston [1749, Pt.II] describes Bayes as “a successor, tho’ 
not immediate to Mr. Humphrey Ditton”*° [p. 390]. James [1867], in his 
second appendix, entitled “Particular account of Presbyterian chapels, and 
list of Baptist chapels in England, 1718-1729”, has the following entry: 


Tunbridge Wells, John Archer [Presbyterian congregation ex- 
tinct, chapel reopened by Independents]. [p. 664] 


This reopening must have occurred after the death of Bayes, who was a 
presbyterian. 
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The 1730’s saw a virulent attack on Sir Isaac Newton’s work on fluxions?!. 
The metaphysical side of this work was attacked by Bishop Berkeley in 1734 
in his The Analyst; or, a Discourse addressed to an Infidel Mathematician, 
London??. This prompted replies from Dr Jurin?? and J.A. Walton, fol- 
lowed by further rebuttal from Berkeley in 173574. A strong defence of 
Newton appeared in a tract?° entitled An Introduction to the Doctrine of 
Fluzions, and Defence of the Mathematicians against the Objections of the 
Author of the Analyst, so far as they are designed to affect their general 
Methods of Reasoning, John Noon, London, 1736. In his question in Notes 
and Queries, de Morgan writes “This very acute tract is anonymous, but 
it was always attributed to Bayes by the contemporaries who write in the 
names of authors; as I have seen in various copies: and it bears his name 
in other places” [1860, p. 9]. 

It appears, on the face of it, that this latter work was the sufficient 
cause?® of Bayes’s election as a Fellow of the Royal Society in 1742, for 
it was not until about 1743 that a resolution was taken by the Society?’ 
“not to receive any person as a member who had not first distinguished 
himself by something curious”?8. The certificate (dated London April 8, 
1742) proposing Bayes for election reads as follows?? 


The Rev‘. M'. Thomas Bays [sic] of Tunbridge Wells, Desiring 
the honour of being Elected into this Society; We propose and 
recommend him as a Gentleman of known merit, well skilled 
in Geometry and all parts of Mathematical and Philosophical 
Learning, and every way qualified to be a valuable member of 
the same. 


It is signed: Stanhope James Burrow 

Martin Folkes Cromwell Mortimer 

John Eames. 

In the New General Biographical Dictionary Rose writes: “He [i.e. Thomas 

Bayes] was distinguished for his mathematical attainments, which led to 

his being elected a fellow of the Royal Society” [1848]. From those of 

Bayes’s writings that have come down to us, we can only assume, as already 

stated, that his fellowship came about as a result of his contribution to the 
Berkleian dispute®?. 

While no other scientific or mathematical work published by Bayes be- 
fore his election (and in the light of which the latter might prove more 
explicable) has come to light, a notebook" of his is preserved in the mu- 
niment room of the Equitable Life Assurance Society, through the careful 
offices of Richard Price and his nephew William Morgan®?. Here, among 
other curiosities, are details of an electrifying machine, lists of English 
weights and measures, notes on topics in mathematics, natural philosophy 
and celestial mechanics, the complete key to a system of shorthand®*, and, 
most important for our purposes, a proof of one of the rules in the Essay, 
to which proof we shall return in Chapter 4. 
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Two further works by Thomas Bayes appeared after his death. In 1764, a 
“Letter from the late Reverend Mr. Thomas Bayes, F.R.S. to John Canton, 
M.A. & F.R.S.” was published in the Philosophical Transactions (read 24th 
November 1763). This short note (a scant two pages) deals with divergent 
series, in particular the Stirling-de Moivre Theorem**, viz. 


log x! = log V27 + (2+ 5) logz—S, 


where 


Cn eee ere aa ee eRe tee Oe 
= 12x 360x232 126025 1680x’ 118829 


The same volume (LIII) of the Philosophical Transactions contains, as 
the fifty-second article, “An Essay towards solving a Problem in the Doc- 
trine of Chances. By the late Rev. Mr. Bayes, F.R.S. communicated by 
Mr. Price, in a Letter to John Canton, A.M. F.R.S”, and it is to this es- 
say that we now turn our attention®®. (This essay was followed by Bayes’s 
(and Price’s) “A Demonstration of the Second Rule in the Essay towards 
the Solution of a Problem in the Doctrine of Chances, published in the 
Philosophical Transactions, Vol. LIII”. This memoir occupies pp. 296-325 
of Volume LIV of the Philosophical Transactions.) 
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While almost all that is known about Thomas Bayes has been mentioned 
above, there are some facts about other members of his family that might 
be of some interest to the reader. 

Thomas’s paternal grandfather was Joshua Bayes, who was baptised on 
the 6th May 1638 and was buried on the 28th August 1703. Like his father 
Richard, Joshua was a cutler in Sheffield, and in 1679, like his father before 
him, he was Master of the Company of Cutlers of Hallamshire. In 1683- 
1684 he was Town Collector, and he also served a spell as Trustee for the 
town*®. 

According to the Reverend A.B. Grosart, writing in the Dictionary of 
National Biography?’, Joshua’s elder brother Samuel was “ejected by the 
Act of Uniformity of 1662 from a living in Derbyshire, and after 1662 
lived at Manchester until his death”. (This act, passed by the anti-puritan 
parliament after the restoration of Charles II, provided that “all ministers 
not episcopally ordained or refusing to conform should be deprived on St. 
Bartholomew’s Day, the 14th of August®® following”.) It is possible that 
Samuel did not in fact leave his parish until 1665, when®? “ejected ministers 
were forbidden to come within five miles of their former cures”. 
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Grosart is substantially correct, apart from the fact that he refers to 
Samuel rather than Joshua as Thomas’s grandfather, for in Turner [1911] 
we find the following records*° 


Licence to Sam: Buze to be a Prfeacher] Teacher in his howse 
in Manchester 


and 


Licence to Sam: Bayes of Sankey in Lancash: to be a Pr[eacher]: 
Teachr. Sep’ 5** [1672] 


(Turner [1911, vol. 1, pp. 518, 556]), while in volume 2 [p. 677] of the same 
work we find 


Sankey.(1) Samuel Bayes (t") (cal. iii, 35), ej. from Grendon, 
Northants. (2) New Meeting House (m[eeting] pl[ace]). 


The most complete, and most accurate, biographical sketch of Samuel 
Bayes is to be found in Matthews [1934]. It runs in full as follows: 


Bayes, Samuel. Vicar of Grendon, Northants. 1660. Adm. 16 
Dec. 1657. Successor paid cler. subsidy 1661. Son of Richard, 
of Sheffield, cutler, by 2nd wife, Alice Chapman. Bap. there 
31 Jan. 1635-6. Trinity, Camb. mc. 1652: Scholar 1655: BA. 
1656. Minister at Beauchief Abbey, Derbs. Licensed (P.), as of 
Sankey, Lancs., 5 Sep. 1672; also, as Buze, at his house, Manch- 
ester. Mentioned in father’s will 15 March 1675-6: p.13 July 
1677. Died c.1681, when Joshua Bayes, of Sheffield, was found 
his brother and heir. Joshua Bayes (1671-1746), minister in 
London, his nephew, not his son. [p. 40] 


Even Joshua Bayes (Thomas’s father) is not immune from biographical 
confusion. Holland [1962] states (correctly) that “Joshua was the nephew of 
Samuel Bayes of Trinity College, Cambridge, ejected minister of Grendon 
in Northamptonshire” [p. 452], a view that is supported by Rose [1848] who 
asserts further that Joshua was “the son of Joshua Bayes of that town [viz. 
Sheffield], and nephew to Samuel Bayes”. Wilson writes that Samuel Bayes 
(father of Joshua), a native of Yorkshire and educated at Trinity oes 
Cambridge, 


enjoyed the living of Grendon in Northamptonshire, which he 
lost at the Restoration; and he seems afterwards to have had 
another living in Derbyshire, but was obliged to quit that also 
upon the passing of the Bartholomew Act, in 1662. Upon being 
silenced, he retired to Manchester, where he lived privately until 
his death. [1814, vol. 4, p. 396] 
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On the 15th November 1686, Joshua was entrusted to the tender care of 
the “reverend and learned Mr.” Richard Frankland*! of Attercliffe, York- 
shire, the founder of the first academy for nonconformists*? and one who, 
subjected to the buffeting of the winds of orthodox persecution, moved his 
academy, together with his pupils, from place to place**. 

There Joshua pursued his studies “with singular advantage”**, and at 
their conclusion proceeded to London, where, on the 22nd of June 1694, he 
was one of the first seven candidates*® (not the first, as stated by Pearson*®) 
to be publicly ordained “according to the practice of the times”*”. This or- 
dination, the first public ceremony of such nature among dissenters in the 
city after the Act of Uniformity, took place at the meeting-house of Dr 
Annesley, Bishops-gate Within, near Little St Helens*®. 

Having been ordained “preacher of the gospel and minister” [Stephen 
1885], Joshua seems to have become a peripatetic preacher, serving churches 
around London’ before settling down at St Thomas’s Meeting-house in 
Southwark, as assistant°? to John Sheffield (“one of the most original of 
the later puritan writers” )°! in 1706 or thereabouts. Since this calling re- 
quired his attendance on Sunday mornings only, Joshua also acted as assis- 
tant to Christopher Taylor®? of Leather Lane in Hatton Garden, London. 
While engaged in this two-fold assistantship, Joshua was one of a panel of 
presbyterian®® divines engaged to complete Matthew Henry’s (1662-1714) 
“Commentary on the Bible”, his special charge being the Epistle to the 
Galatians**. 

On succeeding to Taylor’s pastorate on the latter’s death®® in 1723, 
Joshua resigned his morning service duties at St Thomas’s. Feeling the 
weight of advancing years, he “confined his labours chiefly to one part 
of the day” [Wilson 1814], being assisted on the other part firstly by John 
Cornish*® (d.1727) and then by his own son Thomas®’ (appointed in 1728). 
When Dr Calamy died in 1732, the Merchants’ lectureship at Salters’ Hall®® 
fell vacant, and Joshua was chosen to fill the vacancy. In a special course of 
lectures delivered by a company of divines at Salters’ Hall in 1735, directed 
against Popery, Joshua expounded®? on “The Church of Rome’s Doctrine 
and Practice with relation to the Worship of God in an unknown tongue.” 

As far as can be ascertained, Joshua’s only other published writings were 
some sermons. These are listed by Nicholson and Axon [1915] as, in addi- 
tion to the above, (1) A funeral sermon occasioned by the death of Mr. J. 
Cornish, preached Dec. 10, 1727, [1728]; (2) A funeral sermon occasioned 
by the death of the Rev. C. Taylor, [1723]; and (3) A sermon preach’d to 
the Societies for the Reformation of manners, at Salters’ Hall, July 1, 1723 
[1723]. There is no evidence of any mathematical or scientific discourse, 
and we may (must?) therefore view with some measure of suspicion the 
statement that he was a Fellow of the Royal Society®. Joshua died®! on 
24th April, 1746, (in his 76th year and the 53rd of his ministry®) being 
buried in Bunhill Fields®*, in a grave later to be shared by other members 
of his family. 
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Before taking leave of Joshua Bayes, let us see what Wilson had to say: 


Mr. Bayes was a man of good learning and abilities; a judicious, 
serious and exact preacher; and his composures for the pulpit 
exhibited marks of great labour. In his religious sentiments he 
was a moderate Calvinist; but possessed an enlarged charity 
towards those who differed from him. His temper was mild and 
amiable; his carriage free and unassuming; and he was much 
esteemed by his brethren of different denominations. Though 
his congregation was not large, it consisted chiefly of persons 
of substance®*, who contributed largely to his support, and col- 
lected a considerable sum annually for the Presbyterian fund. 


[1814, p. 399] 


Thomas was the eldest son of Joshua Bayes (1671-1746) and Ann Car- 
penter (1676-1733). He had six siblings:®° Mary (1704-1780), John (1705- 
1743), Ann (1706-1788), Samuel (1712-1789), Rebecca (1717-1799) and 
Nathaniel (1722-1764). The only references to any of the children, apart 
from Thomas, we have managed to find are (a) the mention of John, and 
his father, in the list of subscribers to Ward’s Lives of the Professors of 
Gresham College, and (b) the following obituary from The Gentleman’s 
Magazine and Historical Chronicle for 1789: 


Oct. 11. At Clapham, Sam. Bayes, esq. formerly an eminent 
linen-draper in London, son of the Rev. Mr. Sam [sic] Bayes, an 
eminent dissenting minister. His lady died®® a few weeks before 


him. [vol. 59, p. 961] 


In the 1730’s vitilitigation arose on the following matter: God was not 
compelled to create the universe; why, then, did He do so? The Anglican 
divine Dr John Balguy (1686-1748) started the (published) debate with his 
pamphlet Divine Rectitude, or a Brief Inquiry concerning the Moral Per- 
fections of the Deity; Particularly in respect of Creation and Providence, 
London, 1730. This was followed by a rebuttal®’, attributed to Thomas 
Bayes, entitled Divine Benevolence, or an attempt to prove that the Prin- 
cipal End of the Divine Providence and Government is the Happiness of 
his Creatures. Being an answer to a Pamphlet entitled: “Divine Rectitude: 
or an Inquiry concerning the Moral Perfections of the Deity”. With a Reg- 
ulation of the Notions therein advanced concerning Beauty and Order, the 
Reason of Punishment, and the Necessity of a State of Trial antecedent to 
perfect Happiness, London, printed by John Noon at the White Hart in 
Cheapside, near Mercers Chapel, 1731. Not satisfied with either “Recti- 
tude” or “Benevolence” as the motive for creation, Henry Grove®® (1684— 
1738) found the answer in “Wisdom”, and expounded this in his tract of 
1734: Wisdom, the first Spring of Action in the Deity; a discourse in which, 
Among other Things, the Absurdity of God’s being actuated by Natural In- 
clinations and of an unbounded Liberty, ts shewn. The Moral attributes 
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of God are explained. The Origin of Evil ts considered. The Fundamental 
Duties of Natural Religion are shewn to be reasonable; and several things 
advanced by some late authors, relating to these subjects, are freely exam- 
ined. 

The first two of the above-mentioned pamphlets were published anony- 
mously, but there seems little doubt that the authorships have been cor- 
rectly attributed®?. Remarking on the polemic in general, Pearson [1978] 
writes 


On the whole Balguy and Grove may be held to have had the 
better of the controversy because they considered in opposition 
to Bayes that God may have ends in view, distinct from and 
sometimes interfering with the happiness of his creatures. This 
controversy rather shows Bayes as a man desiring a loving and 
paternal deity than as a good logician or a fluent writer. [p. 359] 


At the time, however, Bayes’s tract was apparently well received”, for we 
read in Walter Wilson’s The History and Antiquities of Dissenting Churches 
and Meeting Houses" that it “attracted notice and was held in high es- 
teem”, and that, compared to those of Balguy and Grove, “Mr. Bayes’s 
scheme was more simple and intelligible” [Wilson 1814, p. 402]. 

The next recorded reference to Thomas Bayes that we have is due to 
William Whiston’? (Newton’s successor in the Lucasian Chair at Cam- 
bridge’), in whose Memoirs of his Life we find the following” 


Memorandum. That on August the 24th this Year 1746, being 
Lord’s Day, and St. Bartholomew’s Day, I breakfasted at Mr. 
Bay’s [sic], a dissenting Minister at Tunbridge Wells, and a 
successor, tho’ not immediate to Mr. Humphrey Ditton, and 
like him a very good Mathematician also. [1749, pt. II, p. 390] 


In his authoritative biographical note to his 1958 edition of Bayes’s Essay 
in Biometrika, Barnard states that “Whiston goes on to relate what he said 
to Bayes, but he gives no indication that Bayes made reply” [p. 294]. That 
this is a slip is evidenced by the continuation of the preceding quotation 
from Whiston’s Memoirs, viz.’® 


I told him that I had just then come to a resolution to go 
out always from the public worship of the Church of England, 
whenever the Reader of Common Prayer read the Athanasian 
Creed; which I esteemed a publick cursing [of] the Christians: 
As I expected it might be read at the Chapel that very Day, 
it being one of the 13 Days in the Year, when the Rubrick 
appoints it to be read. Accordingly I told him that I had fully 
resolved to go out of the Chapel that very Day, if the Minister 
of the Place began to read it. He told me, that Dr. Dowding the 
Minister, who was then a perfect Stranger to me, had omitted it 
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on a Christmas-Day, and so he imagined he did not use to read 
it. This proved to be true, so I had no Opportunity afforded 
me then to shew my Detestation of that Monstrous Creed; Yet 
have I since put in Practice that Resolution, and did so the first 
Time at Lincolns Inn Chapel on St. Simon and St. Jude’s Day 
October 28, 1746, when Mr. Rawlins began to read it, and I 
then went out and came in again when it was over, as I always 
resolved to do afterwards. 


In April 1746, as already mentioned, Joshua Bayes died, leaving £2,000 
and his library to Thomas, with similar bequests to his other children and 
his siblings amounting to some £10,000 in all “. A little over a month after 
drawing up his will Joshua added a codicil in which the bequest of £1,400 
to his daughter Rebecca was revoked, so that she might not be subject 
to the debts of her husband, Thomas Cotton. She was, however, left £40 
for mourning, and the original amount was left in trust, with her brothers 
Thomas and Samuel as trustees, for her son, Joshua Cotton. 

In 1749 Thomas Bayes became desirous of retiring from his cure, and 
to this end he opened his pulpit to various Independent ministers from 
London’’. This arrangement was suddenly terminated on Easter Sunday 
in 1750, when, disliking the Independents’ doctrine, Bayes resumed his 
pulpit ®. (This point is reported rather differently by Barnard [1958], who 
states that Bayes “allowed a group of Independents to bring ministers from 
London to take services in his chapel week by week, except for Easter, 1750, 
when he refused his pulpit’? to one of these preachers” [p. 294].) There is 
something strange about all this; why, after the successful implementation 
of this system in 1749 (“All that summer of 1749 we had supplies from Lon- 
don, Sabbath after Sabbath; ’twas indeed a summer to be remembered” )®°, 
did Bayes suddenly put a stop to it? We shall probably never know. How- 
ever, he seems to have left his cure in about 1750 (though he remained in 
Tunbridge Wells until his death), his successor at Little Mount Sion being 
the Rev. William Johnson®! (or Johnstone or Johnston). 

On the 7th April 1761 Thomas Bayes died®?, and he was interred in 
the family vault®? in Bunhill Fields. Most of Thomas’s inheritance from 
his father was left to his (Thomas’s) family and friends, including £200 to 
be divided between John Hoyle and Richard Price. Also named were “my 
Aunt Wildman...my cousin Elias Wordsworth and my cousin Samuel Wild- 
man”. A substantial bequest of “five hundred pounds and my watch made 
by Ellicot and all my linnen and wearing apparell and household stuff” was 
made to Sarah Jeffery, “daughter of John Jeffery living with her father at 
the corner of Jourdains lane at or near Tonbridge Wells” . 
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Holland [1962, p. 452] has somewhat hesitantly put forward the suggestion 
that Thomas Bayes might have been educated at Coward’s Academy**. 
The discussion in this appendix will, I trust, set this suggestion at nought. 

In 1695 the Congregational Fund Board, originally supported by both 
Presbyterians and Independents, established an academy in Tenter Alley, 
Moorfields. Thomas Godwin was appointed Tutor to the Board in 1696 or 
1697 (Dale [1907, p. 506]), and was succeeded in the principal charge of 
the students by Isaac Chauncey®® (or Chauncy), who had initially been 
appointed in 1699. Chauncey died in 1712, and Thomas Ridgeley®® fol- 
lowed him as theological tutor, being succeeded in turn by John Eames®” 
(F.R.S. 1724), who had previously “held the chair of Philosophy and Lan- 
guages” (Dale [1907, p. 501]). In 1744 the Fund Academy was united with 
the Academy of the King’s Head Society, the union being represented by 
Homerton College until 1850. 

Philip Doddridge (1702-1751) opened an academy®® at the beginning of 
July 1729 at Market Harborough. In December of that year the academy 
was moved to Northampton, Doddridge having been called by an Indepen- 
dent congregation at Castle Hill. In 1733 “an ecclesiastical prosecution was 
commenced against Doddridge for keeping an Academy in Northampton” 
(Dale [1907, p. 518]), a case speedily quashed by the Crown, King George IT 
refusing to allow persecution for conscience’ sake. After Doddridge’s death 
the Academy was moved to Daventry, its deceased head being succeeded by 
Caleb Ashworth, Thomas Robins and Thomas Belsham in turn. The latter 
resigned on finding that he could not conscientiously teach the doctrines 
required by the Coward Trustees, who maintained the Academy and had 
subsidized it from 1738. The latter was moved back to Northampton, with 
John Horsey as theological tutor: he, being suspected of unorthodoxy, was 
removed in 1798 by the Trustees and the Academy was dissolved. It was 
restarted the next year in Wymondley, Hertfordshire, where it remained un- 
til 1832 when it was established as Coward College in Torrington Square, 
London. Here the theological teaching was carried out by Thomas Morell, 
the former Tutor of the Academy, while other subjects were taught by Uni- 
versity College, London. 

In 1778 an “Academy” for the training of evangelists was established by 
the Soctetas Evangelica (founded 1776). In the next few years a more lib- 
eral course of education was adopted, and in 1791 the Evangelical Academy 
moved to Hoxton Square as the Hoxton Academy. In 1825 it was moved to 
Highbury Park and became Highbury College. 

In 1850 the three colleges — Homerton, Coward and Highbury (or Hox- 
ton) — were united to form New College. 

William Coward, a London merchant noted for what the Dictionary of 
National Biography calls “his liberality to dissent”, continued, while alive, 
“to assist the poorer ministers and to aid in the teaching of their children.” 
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On his death, at age 90, at Walthamstow on 28th April 1738, his prop- 
erty was valued at £150,000, the bulk of which was left in charity. As we 
have mentioned, it was Coward’s Trustees who later took over Doddridge’s 
Academy. 

From the preceding discussion it seems quite clear that anything known 
as Coward’s Academy would have been formed far too late to have been 
attended by Bayes. Since, however, Holland cites as evidence for Bayes’s 
possible attendance at Coward’s the fact that John Eames was one of his 
sponsors for election to the Royal Society on 4th November 1742, it is 
possible that he was in fact referring to the Fund Academy. 
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There exists an anecdote concerning Bayes that is reported by Bellhouse 
[1988b]. The passage, from Phippen [1840], runs as follows: 


During the life of Mr. Bayes, an occurrence took place which is 
worthy of record. Three natives of the East Indies, persons of 
rank and distinction, came to England for the purpose of ob- 
taining instruction in English literature. Amongst other places, 
they visited Tunbridge Wells, and were introduced to Mr. Bayes, 
who felt great pleasure in furnishing them with much useful and 
valuable information. In the course of his instructions, he en- 
deavoured to explain to them the severity of our winters, the 
falls of snow, and the intensity of the frosts, which they did not 
appear to comprehend. To illustrate in part what he had stated, 
Mr. Bayes procured a piece of ice from an ice-house, and shewed 
them into what a solid mass water could be condensed by the 
frost — adding that such was the intense cold of some winters, 
that carriages might pass over ponds and even rivers of water 
thus frozen, without danger. To substantiate his assertion, he 
melted a piece of the ice by the fire, proving that it was only 
water congealed. ‘No’, said the eldest of them, ‘It is the work 
of Art! — we cannot believe it to be anything else, but we will 
write it down, and name it when we get home’. [p. 97] 


It is not known who these travellers were, or when their visit took place. 
Similar tales are recounted in David Hume’s essay Of Miracles and in John 
Locke’s Essay concerning Human Understanding. 
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The complete list (as far as has been ascertained) of references to Bayes in 
the archives of Edinburgh University, in no particular order, runs as follows 
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(the references in square brackets are the shelf-marks of the university’s 
special collections department): 


1. 


[Da]. Matriculation Roll of the University of Edinburgh. Arts-Law- 

Divinity. Vol. 1, 1623-1774. Transcribed by Dr. Alerander Morgan, 

1933-1934. Here, under the heading “Discipuli Domini Colini Drum- 

mond qui vigesimo-septimo die Februarii, MDCCXIX subscripserunt” — 
we find the signature of Thomas Bayes. This list contains the names 

of 48 students of Logic. 


. [Da.1.38] Library Accounts 1697-1765. Here, on the 27th February 


1719, we find an amount of £3-0-0 standing to Bayes’s name — and 
the same amount to John Horsley, Isaac Maddox and Skinner Smith. 
All of these are listed under the heading “supervenientes”, i.e. “such 
as entered after the first year, either coming from other universities, or 
found upon examination qualified for being admitted at an advanced 
period of the course” (Dalzel, [1862, vol. II, p. 184]). 


. Leges Bibliothecae Universitatis Edinensis. Names of Persons admit- 


ted to the Use of the Library. The pertinent entry here runs as follows: 


Edinburgi Decimo-nono Februarij Admissi sunt hi duo Ju- 
venes praes. D. Jacobo. Gregorio Math. P. Thomas Bayes. 
Anglus. John Horsley. Anglus. 


Unfortunately no further record has been traced linking Bayes to this 
eminent mathematician. 


. [Dc.5.247]. In the Commonplace Book of Professor Charles Mackie, 


we find, on pp. 203-222, an Alphabetical List of those who attended 
the Prelections on History and Roman Antiquitys from 1719 to 1744 
Inclusive. Collected 1 July, 1746. Here we have the entry 


Bayes ( _—+), Anglus. 1720,H. = 21,H. 3 


The import of the final “3” is uncertain. 


. Lists of Students who attended the Divinity Hall in the University of 


Edinburgh, from 1709 to 1727. Copied from the MSS of the Revd. Mr. 
Hamilton, then Professor of Divinity, etc. Bayes’s name appears in 
the list for 1720, followed by the letter “J”, indicating that he was 
licensed (though not ordained). 


. List of Theologues in the College of Edin[burgh] since Oct:1711. the 


Ist. columne contains their names, the 2d the year of their quiimven- 
tion, the 3d their entry to the profession, the {th the names of those 
who recommend them to the professor, the 5th the bursaries any of 
them obtain, the 6th their countrey and the 7th the exegeses they had 
in the Hall. Here we have 
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Tho.Bayes|1720|1720|Mr Bayes} — |London|E. Feb. 1721. E. Mar. 1722. 


In a further entry in the same volume, in a list headed “Societies” , 
we find Bayes’s name in group 5 in both 1720 and 1721. (These were 
perhaps classes or tutorial groups.) In the list of “Prescribed Exegeses 
to be delivered” we have 


1721. Jan. 14. Mr. Tho: Bayes. the Homily. Matth. 7.24, 25, 26, 27. 
and 
1722. Ja. 20. Mr Tho: Bayes. a homily. Matth. 11. 29, 30. 


The final entry in this volume occurs in a list entitled “The names 
of such as were students of Theology in the university of Edinburgh 
and have been licensed and ordained since Nov. 1709. Those with the 
letter .o. after their names are ordained, others licensed only. Here 
we find Bayes’s name, but without an “o” after it. 


There is thus no doubt now that Bayes was educated at Edinburgh Uni- 
versity. There is unfortunately no record, at least in those records currently 
accessible, of any mathematical studies, though he does appear to have pur- 
sued logic (under Colin Drummond) and theology. 

That Bayes did not take a degree at Edinburgh is in fact not surprising. 
Grant [1884, vol. I] notes that “after 1708 it was not the interest or con- 
cern of any Professor in the Arts Faculty --- to promote graduation ---. 
the degree [of Master of Arts] rapidly fell into disregard” [p. 265]. Bayes 
was, however, licensed as a preacher, though not ordained. 

The manuscript volume in the library of Edinburgh University that con- 
tains the list of theologues also contains a list of books. The range of topics 
covered seems too narrow for this to be a listing of books in the Univer- 
sity library, and it is possible that the works listed were for the particular 
use of the theologues. But be that as it may: only two of these books are 
recognizable as being distinctly mathematical: they are 


(i) Keckermanni systema mathem: and, 
(ii) Speedwells geometrical problems. 


At least that is what appears to be written. The first is probably a book 
by Bartholomaeus Keckermann, who published other “systema” during the 
early part of the seventeenth century. The second work is most probably 
John Speidell’s A geometrical extraction, or a collection of problemes out 
of the best writers, first published in 1616 with a second edition appearing 
in 1657. 


Bayes’s Essay 


Et his principits, via ad majora sternitur. 


Isaac Newton. 
Tractatu de Quadratura Curvarum. 


3.1 Introduction 


As we have already mentioned, Bayes’s books and papers were demised — 
or so one is sometimes given to believe — to the Reverend William Johnson, 
his successor at the Pantile Shop’ at Little Mount Sion. Timerding [1908] 
concludes that 


nach seinem Ableben betrauten seine Angehorigen Price mit 
der Durchsicht seiner hinterlassenen Papiere, in denen verschied- 
ene Gegenstande behandelt waren, deren Veroffentlichung ihm 
aber seine Bescheidenheit verboten hatte [p. 44] 


but it is difficult to see, on the basis of Bayes’s posthumous publications, 
why he should have papers on “sundry matters” ascribed to him, and why 
his not publishing should be attributed (or even attributable) to a modesty? 
Miranda might well have envied. 

Whether some, or all, of the papers were passed on to Richard Price, 
or whether he was merely called in by Johnson or Bayes’s executors to 
examine them, is unknown. However, on the 10th November 1763 Price 
sent a letter to John Canton? that opens with the words 


Dear Sir, I now send you an essay which I have found among 
the papers of our deceased friend Mr. Bayes, and which, in my 
opinion, has great merit, and well deserves to be preserved. 


It seems probable, therefore, that, apart from the Essay and a letter* on 
asymptotic series (published in 1764 in the Philosophical Transactions 53 
(1763), pp. 269-271), Bayes left behind no other significant unpublished 
mathematical work°. 

The Essay has undergone a number of reprintings® since it was first 
published. In view of this fact, I shall content myself with giving, in this 
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chapter, a fairly detailed discussion, in modern style and more geometrico, 
of the Essay. The latter, divided into two sections’, is preceded by Price’s 
covering letter, and it is to this that we first turn our attention. 


3.2 Price’s introduction 


Price clearly states [p. 370] that Bayes had himself written an introduction 
to the Essay. For reasons best known to himself, Price omitted forwarding 
this proem to Canton, contenting himself with giving, in his accompanying 
letter, a report of Bayes’s prefatory remarks. Here we find clearly stated 
the problem that Bayes posed himself, viz. 


to find out a method by which we might judge concerning the 
probability that an event has to happen, in given circumstances, 
upon supposition that we know nothing concerning it but that, 
under the same circumstances, it has happened a certain num- 
ber of times, and failed a certain other number of times. 

[pp. 370-371] 


Several points should be noted in this quotation: firstly, the event of 
current concern is supposed to take place under the same circumstances as 
it has in the past. This phrase is missing both from Bayes’s own statement 
of the problem [p. 376] and from his scholium [pp. 372 et seqq.|. Whether it 
is in fact implicit in his Essay will be examined later in this work. Secondly, 
what does the phrase “judge concerning the probability” mean? Are we to 
understand by it that a specific value should be attached to the probability 
of the happening of the event, or merely that a (possibly vague) inference 
about the probability should be made? In Bayes’s statement of his problem, 
Edwards [1974, p. 44] finds the latter interpretation meant: we shall return 
to this point later. 

Continuing his reporting of Bayes’s introduction, Price points out that 
Bayes noted that the problem could be solved (and that not with difficulty 
— p. 871) 


provided some rule could be found according to which we ought 
to estimate the chance that the probability for the happening of 
an event perfectly unknown, should lie between any two named 
degrees of probability, antecedently to any experiments made 
about it. [p. 371] 


Three points come to mind from this passage: firstly, we are required to 
estimate the chance of a probability. The difficulty that the word “judge” in 
an earlier quotation occasioned (as discussed in the preceding paragraph) 
presents itself again in the phrase “estimate the chance”: does this denote a 
point or an interval estimate? And is this estimate to be used for prediction? 
From the previous quotation this certainly seems to be the case, but, as 
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we shall see later, the problem as posed by Bayes at the start of his Essay 
is silent on this point, and the matter of prediction is only taken up in 
the Appendix, which is by Price. One can indeed but regret the latter’s 
suppression of Bayes’s own introduction. 

Secondly, note that the statement of the problem refers only to inference 
about “degrees of probability”: inference about an arbitrary parameter is 
not mentioned. And thirdly, the estimation is to be undertaken prior to 
any experimental investigation. 

We read further, in Price’s introduction, that Bayes’s first thought was 
that, for the solution to be effected, 


the rule must be to suppose the chance the same that it |i.e. 
the probability p of the unknown event] should lie between any 
two equidifferent degrees [of probability] [p. 371] 


(i.e. po2—pi = q2—q1 > Pr[pi < p < po] = Prigi < p < 92]) — the rest, he 
believed, would then follow easily from “the common method of proceeding 
in the doctrine of chances” [p. 371]. (It seems, then, that a certain gener- 
ally received corpus of probability rules was already in use by this time.) 
In this quotation we see the origin of the notorious “Bayes’s postulate” , 
an hypothesis whose tentative advocation (let alone definite adoption) has 
engendered more heat than light in numerous statistical and philosophical 
papers and proceedings. 

Proceeding on this assumption, Bayes proposed® “a very ingenious solu- 
tion of this problem”. Second thoughts, however, persuaded him that “the 
postulate on which he had argued might not perhaps be looked upon by 
all as reasonable”. Fisher [1956, pp. 9-10] was persuaded® that it was the 
realization of these doubts that prevented Bayes from publishing his essay 
during his lifetime (doubts apparently not shared by Price), though this is 
not suggested in Price’s covering letter. Indeed, the latter informs us that 
Bayes laid down “in another form* the proposition in which he thought 
the solution of the problem is contained” [p. 371], defending his reasons in 
a scholium. In §4.5 of the present work it is argued that Bayes’s original 
solution is given in his tenth proposition, the ninth, which is followed by the 
scholium, containing the alternative form. Karl Pearson, writing of Bayes’s 
initial postulate, says that, according to Price, “he [i.e. Bayes] rejected it 
and proceeded on another assumption” [Pearson 1978, p. 364]: but as I 
have already suggested, such a conclusion seems unwarranted. 

The importance of this problem was not lost on Price!®, and a long para- 
graph [pp. 371-372] is devoted to a discussion of this matter. Price notes 
here that the discussion of the present problem is necessary to determine 
“in what degree repeated experiments confirm a conclusion” [p. 372], and 
mentions further that the problem 


8 « 


*Emphasis added. 
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is necessary to be considered by any one who would give a clear 
account of the strength of analogical or inductive reasoning. 
[p. 372] 


Price concludes his comments on this point by saying 


These observations prove that the problem enquired after in 
this essay is no less important than it is curious. [p. 372] 


The problem that Bayes considered was new!!, or at least it had not 


been solved before [p. 372]. Price mentions de Moivre’s improvement of 
Bernoulli’s Law of Large Numbers??, and sees in Bayes’s problem a converse 
to this!?. Clearly, to de Moivre at least, Bayes’s problem was not as difficult 
as the Law of Large Numbers [p. 373], yet it has undoubtedly been more 
eristic. De Moivre’s theorem was thought applicable to “the argument taken 
from final causes for the existence of the Deity” [Bayes 1763a, p. 374]: Price 
claims that the problem of the Essay is more suited to that purpose, 


for it shows us, with distinctness and precision, in every case 
of any particular order or recurrency of events, what reason 
there is to think that such recurrency or order is derived from 
stable causes or regulations in nature, and not from any of the 
irregularities of chance. [p. 374] 


The last two rules of the Essay were presented without their proofs, such 
deductions being, in Price’s view, too long: moreover the rules, Price claims, 
“do not answer the purpose for which they are given as perfectly as could be 
wished” [p. 374]. Price later published (in 1765) a transcription'* of Bayes’s 
proof of the second rule, together with some of his own improvements. In 
connexion with the first rule he writes, in a covering letter to Canton, 


Perhaps, there is no reason about being very anxious about 
proceeding to further improvements. It would, however, be very 
agreeable to me to see a yet easier and nearer approximation 
to the value of the two series’s in the first rule: but this I must 
leave abler persons to seek, chusing now entirely to drop this 
subject. [p. 296] 


The improvements were in the main limited to a narrowing of the limits 
obtained by Bayes??®. 

Price also added short notes where he considered them necessary, and 
appended 


an application of the rules in the essay to some particular cases, 
in order to convey a clearer idea of the nature of the problem, 
and to show how far the solution of it has been carried [p. 374] 


any errors being his. 
Thus far Price’s introduction. 
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3.3 The first section 


Bayes’s Essay opens with a clear statement of the problem whose solution 
is proposed!®: 


Given the number of times in which an unknown event has 
happened and failed: Required the chance that the probability 
of its happening in a single trial lies somewhere between any 
two degrees of probability that can be named. [p. 376] 


This problem, says Savage in an unpublished note?’, 


is of the kind we now associate with Bayes’s name, but it is 
confined from the outset to the special problem of drawing the 
Bayesian inference, not about an arbitrary sort of parameter, 
but about a “degree of probability” only. [1960] 


In modern notation, the solution to this problem (given as Proposition 
10 in the Essay) can be expressed thus: 


Pr[z, < 2 < xq | p happenings and q failures of the unknown event] 


= [PP t0-2ytas | [ aa-nide 


Bayes, of course, gives the solution in terms of the ratio of areas of rect- 
angles, as Todhunter [1865, art. 547] notes. In his edition of Bayes’s Essay, 
Timerding [1908] explains this avoidance of the integral notation in the 
interesting (albeit faintly chauvinistic) sentence 


Um Bayes’ Darstellung zu verstehen, mu man sich erinnern, 
da in England die Integralbezeichnung verpont war, weil ihr 
Urheber Leibniz als Plagiator Newtons galt. [p. 50] 


But before an attempt at solution is essayed, however, Bayes devotes 
some pages to various definitions, propositions and corollaries in elementary 
probability!®. Price relates that Bayes 


thought fit to begin his work with a brief demonstration of the 
general laws of chance. His reason for doing this, as he says in 
his introduction, was not merely that his reader might not have 
the trouble of searching elsewhere for the principles on which 
he has argued, but because he did not know whither to refer 
him for a clear demonstration of them. [p. 375] 


Now this is a somewhat curious statement. It is difficult to believe that 
Bayes was completely ignorant of de Moivre’s The Doctrine of Chances, 
of which three editions were published (in 1718, 1738 and 1756) during 
Bayes’s lifetime!?. De Moivre was, moreover, elected to a fellowship of the 
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Royal Society in 1697, and since he did not die until 1754, it seems unlikely 
that Bayes did not know of his work. The third edition of The Doctrine of 
Chances contained a 33 page Introduction explaining and illustrating the 
main rules of the subject. However, Bayes’s definition of probability differs 
from that of de Moivre?°, and this might well be the reason for the detailed 
first section of the former’s Essay. 

The definition of probability given by Bayes, viz. 


the probabthty of any event is the ratio between the value at 
which an expectation depending on the happening of the event 
ought to be computed, and the value of the thing expected upon 
it’s happening [p. 376] 


is slightly unusual2!, as Bayes apparently realized himself since he chose to 
give a definition of that sense of the word “which all will allow to be its 
proper measure in every case where the word is used” [p. 375]. 

We have already mentioned (§3.2) the possible ambiguity in Price’s 
use of the phrase “judge concerning the probability” in his statement of 
Bayes’s problem. Notice that Bayes, by using “chance” as synonymous”? 
with “probability” [p. 376], failed to resolve the difficulty?>. 

The rest of this first section of the Essay, following the definitions, is 
devoted to seven routine (at least by today’s standards) propositions and a 
number of corollaries, including a lucid definition of the binomial distribu- 
tion. One might note, however, that Bayes regarded the failure of an event 
as the same thing as the happening of its contrary {1768a, pp. 376, 383, 
386], a view that has bearing on the question of additivity of degrees of 
belief 2+. Notice too that Bayes takes pains to point out that the happening 
or failure of the same event, in different trials (i.e. as a result of certain 
repeated data), is in fact the same thing as the happening or failure of as 
many distinct independent events, all similar?® [1763a, p. 383]. 


3.4 The second section 


Before we undertake any critical exegesis of this section, it might perhaps 
be advisable to reformulate certain parts of it in modern notation. Similar 
accounts have been given by Fisher, Barnard and Edwards”®, but it will be 
useful to have a “translation” here also. 

This Section opens with two postulates?’. In the first of these it is sug- 
gested that a level square table?® be so made that a ball W thrown upon 
it will have the same probability of coming to rest at any point as at any 
other point?’. The second postulate is that this throw of the first ball is 
followed by p+ q or n throws of a second ball, each of these latter throws 
resulting in the occurrence or failure of an event M according as to whether 
the throw results in the second ball’s being nearer to or further from a spec- 
ified side of the table than is the first ball. Examination of Bayes’s proof 
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C 


A x B 


FIGURE 3.1. Bayes’s square table, showing the abscissa x of the point at which 
the first ball thrown comes to rest. 


of the results of this Section shows that we may, without loss of generality, 


express these postulates in the following form*®?: 


(i) a single value z is drawn from a uniform distribution con- 
centrated on [0,1], and 


(ii) asequence of Bernoulli trials, with probability z of success, 
is generated. | 


These postulates are followed by two lemmata that essentially provide their 
geometrization. 

Let us suppose, without loss of generality, that the square table is of 
unit area, and let A have co-ordinates (0,0). Let z be the abscissa of the 
point on the table at which the first ball comes to rest. 


Lemma 1. For any 21,22 such that 0 <4, <4< 249 <1, 
Prigy <2 < oo) = rope 7 
Lemma 2. Suppose that the second ball is thrown once on the table. Then 
Pr [success] = z. 


Proposition 8. For any 21, 22 such that O< 21 < 22 <1, 


Pr[xr, <a < x2 & p successes and gq failures in p+q = 7 trials] 


S ‘ ae =e PA. 
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It is not clear whether Bayes interpreted “z lies between A and B” in the 
sense of included or excluded end points: I (like Edwards [1978]) have used 
“0 <2 < 1” rather than “0 < 2 < 1”, and similar statements, throughout 
(the distinction is a fine one, of course, and of little significance here). 


Corollary. Pr [0 < « < 1 & p successes and q failures in p+ q trials] 


= [ (PR) ar(1 2)! de (= <4). 


Proposition 9. For any 21, £2 such that 0 < 21 < x42 <1, 


Pr [x1 < @ < xq | p successes and q failures in p+ q trials] 


= [ (PP*) 21-2)" dx /{ (774) 2(1 2)? dx 


Corollary. Pr [xz < x2 | p successes and q failures] 


| pte 
= peer | z?(l—ax)' dz. 
. . 0 


Scholium**: suppose one knows how often a success has occurred (and how 
often it has not occurred) in n trials. One may then “give a guess where- 
abouts it’s probability is”, and hence (by the preceding proposition) find 
“the chance that the guess is right” [Bayes 1763a, p. 392]. Bayes now as- 
serts that the same rule is to be used when considering an event whose 
probability, antecedent to any trial, is unknown. In support of this asser- 
tion he adduces the following argument (paraphrased here): let us suppose 
that to know nothing of the (antecedent) probability is equivalent to being 
indifferent between the possible number of successes in 7 trials (i.e. each 
possible number of successes is as probable as any other)**. Writing of “an 
event concerning the probability of which we absolutely know nothing an- 
tecedently to any trials made concerning it” [pp. 392-393], Bayes in fact 
goes on to say 


that concerning such an event I have no reason to think that, 
in a certain number of trials, it should rather happen any one 
possible number of times than another. [p. 393] 


But, by the Corollary to Proposition 8, this is precisely the situation of the 
proposed model. 


3.4 The second section 39 


In what follows therefore I shall take for granted that the rule 
given concerning the event M [i.e. success] in prop. 9. is also 
the rule to be used in relation to any event concerning the 
probability of which nothing at all is known antecedently to 
any trials made or observed concerning it. And such an event I 
shall call an unknown event. [pp. 393-394] 


Then, following a corollary in which, in essence, the table is assumed to be 
of unit area, one finds Proposition 10, which provides the solution to the 
problem initially posed**: 


Proposition 10. Let 2 be the (prior) probability of an unknown event A. 
Then 


Pr [z, < x < 22 | A has happened p times and failed q times in p+ q trials] 
v2 1 
- P+9) (au q 
=f Ger aaytde / f r4) 2P(1— 2) dz . 


It should be noted that Proposition 9, framed as it is in terms of “table 
and balls thrown”, does not furnish the desired solution**: the preceding 
quotation provides the link between this result and that for the “unknown 
event” in Proposition 10. 

Having stated this proposition, in which the solution to the problem 
posed at the outset of his paper lies, Bayes finds its proof “evident from 
prop. 9. and the remarks made in the foregoing scholium and corollary” 
[p. 394]. He then turns his attention to the evaluation of the incomplete 
beta-integral®* appearing in this proposition (or, for that matter, in the 
ninth). The details of five Articles [pp. 395-399] are summarized in Rule 1 
as follows: 


Rule 1. Pr [x1 < x < 22 | p successes and q failures] 
pti p+2 p+3 
= (n+1) ay ———— (7) “2 (2) pd We, 
p p+l pt+2 pt+3 


+1 +2 +3 
{27-254 (aS eel), 


This essentially completes Bayes’s contribution: the next few pages (up to 
p. 403) contain (in two further rules) particular methods of approximating 
the solution given in Rule 1, and are in the main due to Price. 

Noting that the formula of Rule 1 is impractical for large values of p 
and q, Price states that Bayes deduced another expression, summarized in 
Rule 2 (which in turn was deduced “by an investigation which it would be 
too tedious to give here” [p. 400]) as follows: 
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Rule 2. If nothing is known concerning an event but that it has happened p 
times and failed ¢ in p+ q or n trials, and from hence I guess that 
the probability of its happening in a single trial lies between p/n -+ z 
and p/n — z; if m? = n?/pq, a = p/n, b = q/n, E the coefficient 
of the term in which occurs a?b? when (a + 5)” is expanded, and 


: 3.3 = 
= mtd x sea x Fa? b!? xa by the series mz — moze n=2 mz a 


é&c. my chance to be in 


(n-2)(n—4) 7 m2" n—2)(n—4)(n—-6 mz? 
Gny(an) “~~ 7 T ~Tan\Gnyany * “9 
the right is greater than 


2% 
1+4+2HaPb? + 2h aPbi/n 
and less than 
2h, 
1 —2FaPb! —2FaPba/n 
And if p = q my chance is 2) exactly. 


[p. 400; notation slightly modernized.] The term 2#a?b!/n occurring in the 
denominator of each of the two last expressions was apparently omitted by 
Bayes “evidently owing to a small oversight in the deduction of this rule” , 
which oversight Price goes on to say, “I have reason to think Mr. Bayes had 
himself discovered” [p. 400]. A further culpa levis occurs in the definition 
of m?: it should be taken equal to n?/2pq: this was pointed out by Price 
in the paper of 1764 in the twenty-eighth article. 

The third rule, “which is the rule to be used when mz is of some consid- 
erable magnitude” [p. 403], may, I suspect, be due to Price as it is stated: 
for whereas the latter is most punctilious in referring to Bayes in his (i.e. 
Price’s) discussion of the second rule, there is no direct mention of Bayes 
in the immediate preamble to the third rule. However Bayes did give a 
theorem for use when mz is large (see p. 402), a theorem whose application 
effects the desired modification of the second rule. 

In the Supplement to the Essay Price went into more detail. As he wrote 
in the accompanying letter to John Canton, 


I have first given the deduction of Mr. Bayes’s second rule chiefly 
in his own words; and then added, as briefly as possible, the 
demonstrations of several propositions, which seem to improve 
considerably the solution of the problem, and to throw light on 
the nature of the curve by the quadrature of which this solution 
is obtained. [Bayes, 1764, p. 296] 


Strictly speaking this brings us to the end of this section. However, 
Price’s remarks at the start of the Appendix are pertinent, and we ac- 
cordingly adduce them here. He begins by saying 


The first rule gives a direct and perfect solution in all cases; and 
the two following rules are only particular methods of approxi- 
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mating to the solution given in the first rule, when the labour 
of applying it becomes too great. [p. 404] 


Then follows a paragraph setting out more succinctly than before the cases 
(depending on the magnitudes of p, g and mz) in which the various rules 
may be used. 


3.5 The Appendix* 


The last fifteen pages contain some applications of the preceding rules. 

The first of these applications runs as follows: let M be an event con- 
cerning whose probability (antecedently to any trials) nothing is known. 
Denoting by S; the occurrence of M on the 7-th trial, we have 


(i) Pr[Z<a<1/S] =8,; 

(ii) Pr[gZ <2 <1] 51,53] = {; 

(iii) Pr [$5 <2 <1] 51,52,53] = 2: 

(iv) Pr [4 <a <1] psuccesses] = (2?+! — 1) /2Pt! ; 

(v)3” Pr [2 < x < 42 | 10 successes and no failures] = 0.5013. 


Price next goes on to consider a particularly noteworthy example: sup- 
pose we have a die of unknown number of faces and unknown constitu- 
tion (it will not, I suppose, do any harm to suppose the faces numbered 
M1, M9, ...,N- — not necessarily distinct). The die is thrown once, the face 
n; (say) resulting (which shows only that the die has this face). It is only 
at this stage, i.e. after the first throw, that the situation of the Essay ob- 
tains; the occurrence of n; in any subsequent trial being an event of whose 
probability we are completely ignorant. If, at the second trial, n; appears 
again, then by the first application, the odds will be three to one on that 
n; is favoured (either through being more numerous, or (equivalently) be- 
cause of the die’s constitution). We shall return to this matter in the next 
chapter. 

Price then emphasizes that improbability is not the same thing as impos- 
sibility, and goes on to discuss applications to “the events and appearances 
of nature” [p. 408]. Once again he takes pains to point out that the first 
experiment merely shows that some particular occurrence is possible: no 
notion of uniformity of nature is suggested, though further observations of 
the same occurrence may tend to support that view. As an illustration*® 
Price cites the well-known example of the Rising of the Sun, emphasizing 
once again that a “previous total ignorance of nature” [p. 410] is required 
for the validity of his arguments. 
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Having considered the case where only “successes” have occurred, Price 
now turns his attention to the case in which either “success” or “failure” 
may arise. As a particular illustration of the procedure, he considers a 
lottery of unknown scheme in which the proportion of blanks to prizes is 
unknown. Price in fact evaluates by Rules 1 to 3, Pr [3 0 5 | p blanks 
and q prizes] for various values of p and q, where x denotes the proportion 
of blanks to prizes. 

He concludes this Appendix by noting that 


what most of all recommends the solution in this Essay is, that 
it is compleat in those cases where information is most wanted, 
and where Mr. De Moivre’s solution of the inverse problem can 
give little or no direction; I mean, in all cases where either p or 
q are of no considerable magnitude [p. 418], 


and he emphasizes that, while it is fairly easy to see that 
Pr [success] : Pr [failure] :: p : q 


(for large values of p and q), the Essay demonstrates the folly of such a 
judgement when either p or q is small. 

In 1764 Price forwarded a supplement to the Essay to John Canton. In 
this paper, published in the volume of the Philosophical Transactions for 
1764, may be found proofs, and some development, of the Rules given in 
the Essay. This supplement will be considered in Chapter 5. 


3.6 Summary 


Before we pass on to a closer examination of the Essay, it might be useful 
to provide a recapitulation of its main results. From Price’s introductory 
remarks and Bayes’s own work one sees that the scheme of the Essay, and 
the thought prompting it, can be summarized as follows: 


Problem 1.°° An event M has occurred (under the same circumstances) p 
times and failed to occur q times. How can we estimate the probability 
of this event’s happening? 


The solution can be effected if one can solve 


Problem 2. Let P(A) denote the probability of the (perfectly unknown) 
event M. For any a and £, with a < £, what is Pr[a < P(M) < f]? 
(This is to be determined before any experimentation.) 


This in turn can be solved by using 
Rule 1. If fon ay Bo SS then 
Prfa, < P(M) < #;] = Prlag < P(M) < fp] 


— i.e. a uniform distribution for P(M). 
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Being unhappy with this procedure, Bayes next considers 


Problem 3. M has happened p times and failed to happen q times. For 
any a and 7, what is Pr{a < P(M) < GB | p,q]? 


Turning to the “table and balls” example, we see that 6, the position of 
the first ball on the horizontal axis, is distributed U/((0,1)). If X denotes 
the number of “successes” obtained in n throws of the second ball, then, 
for a given 9, X ~ b(n,@). It then follows that (unconditionally) X has a 
discrete uniform distribution on {0,1,2,...,.n}— Le. 


Pr[X =k] =1/(n+1), k € {0,1,2,...,n}. 
Assuming that this holds for all k and n, we have in fact 
6~U((0,1)) eX ~U({0,1,...,n}). 


Bayes proposes in his Scholium that the number of occurrences of the un- 
known event should be taken to have a discrete uniform distribution. 
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Commentary on Bayes’s 
Essay 


The labours of others have raised for us 
an immense reservoir of important facts. 


Charles Dickens, Pickwick Papers. 


4.1 Introduction 


In the preceding chapter several points, arising from Bayes’s Essay, were 
either glossed over or omitted altogether. It is now time to fill in these 
lacunae, though certain of the topics to be discussed here will in fact un- 
dergo further development later in this tractate (in particular, we shall not 
consider here any elaboration of the main results of the Essay, and the 
Supplement to the Essay will be dealt with in Chapter 5). 


4.2 Price’s introduction 


In his statement of Bayes’s problem, Price says (see §3.2) that the event 
whose probability is sought should be known to take place “under the same 
circumstances” [pp. 370-371] as it occurred under in the past. According 
to Price, this phrase was in fact used by Bayes in his own (suppressed) 
introduction to the Essay: it is, however, missing from the statement of 
the problem on p. 376, although its implicit assumption is made clear, I 
believe, from the following observations. 

In his postulate at the end of Section II, Bayes refers to “the happening 
of the event M in a single trial” [p. 385], and the word “trials” appears in 
each of the propositions of that section. But not for Bayes any escape from 
the precise meaning called for of this word: he grasps the nettle firmly, and 
in the first part of his Essay we find the following passage: 
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Definition. If in consequence of certain data there arises a prob- 
ability that a certain event should happen, its happening or fail- 
ing, in consequence of these data, I call its happening or failing 
in the Ist trial. And if the same data be again repeated, the 
happening or failing of the event in consequence of them I call 
its happening or failing in the 2d trial; and so on as often as 
the same data are repeated. [p. 383] 


It is, I think, quite clear from this quotation that the conditions under 
which the event of current concern takes place are supposed to be the same 
as those under which it happened in the past. (Such an assumption is of 
course frequently tacit in this sort of work: that Bayes bothers to state it — 
and that most carefully — is surely a tribute to the rigour of his thinking, 
if not indeed to his mathematical ability.) 

A more difficult matter, also stemming from Price’s statement of the 
problem, arises in connexion with the phrase “judge concerning the prob- 
ability” (see §3.2), a phrase that it is expedient to consider in conjunction 
with his later one “to estimate the chance that the probability...” Two in- 
terpretations of the first phrase are possible, as Edwards [1974] notes in 
the following words: 


Does ‘judge concerning the probability’ mean ‘attach a specific 
value to the probability of the next event’ or does 1t mean ‘make 
an inference — possibly vague — about the probability’? [p. 44] 


If the phrase “of the next event” may be assumed to qualify the last word in 
this quotation, then there can, I think, be little doubt that Price intended 
the latter interpretation (the justification for this assertion may become 
more apparent when, in a later section in this chapter, we consider Price’s 
applications of the results of the Essay). 

Moreover, in view of Bayes’s own statement of the problem he proposed 
to solve (see §3.3) and his words “by chance I mean the same as probabil- 
ity” [p. 376], it seems to me, as it indeed did to Edwards, that Bayes was in 
fact only interested in an inference (possibly vague) about the probability: 
the second of Price’s introductory phrases quoted above also supports this 
view, I suggest. | 

It is perhaps significant, though I do not wish to urge the point, that, 
according to the order in which the comments are reported by Price 
[pp. 370, 371], the first idea was to find out a method by which we might 
“judge concerning the probability” [p. 370] (i.e. a possibly vague inference), 
and then that this could be done by estimating the chance of the probabil- 
ity’s being between any two degrees of probability. 
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4.3 The first section 


Remarks on this part of the Essay are wide-ranging. Todhunter [1865, art. 
544] describes it as “excessively obscure”, and he comments further! that 
it “contrasts most unfavourably with the treatment of the same subject by 
De Moivre.” Savage [1960] finds in this section “a whole short course on 
probability”, and he provides a paraphrase of it on pp. 2-3 of his unpub- 
lished note. Stigler [1982a, p. 250] sees here “an intriguing development of 
rules of probability, most of which we would now regard as elementary”. A 
useful summary is given in Dinges [1983, pp. 75-80], in which work it is also 
suggested that “Wir mochten Th. Bayes gerne als ersten Zeugen fiir einen 
theoretischen Wahrscheinlichkeitsbegriff in Anspruch nehmen” [p. 94]. 

The section opens with seven definitions. While these are unexception- 
able (although some may perhaps be slightly unusual), two points are worth 
noting. The first of these concerns Bayes’s definition of the probability of 
an event in terms of expectation (see §3.3). This is certainly different to the 
(more usual) definition given by de Moivre’, and the fact that Bayes’s prob- 
lem required such an approach for its solution might well be the cause of 
his giving his own “probability primer” and Price’s statement that Bayes 
“did not know whither to refer him [i.e. the reader] for a clear demon- 
stration of them [i.e. the principles on which Bayes argued]” [p. 375]. The 
second point to be noted concerns Bayes’s definition of independence®. The 
seventh definition reads as follows: 


Events are independent when the happening of any one of them 
does neither increase nor abate the probability of the rest. 


[p. 376] 


It might seem, then, that Bayes saw no distinction between “indepen- 
dence” and “pairwise independence” (see Savage [1960]). However D.V. 
Lindley* has suggested that Savage was possibly wrong on this point. His 
argument runs as follows: let [(#) denote the indicator function of the 
event E’, so that [(#) = 1 means that the event F has occurred. For three 
events Bayes’s definition of independence can be written as 


Pr[I(A) 1(B)|(C)] = PrlI(A) 1(B)] , 


only when I(C’) = 1, and Lindley charitably suggested that this should also 
be taken to hold when J(C’) = 0. Since Bayes implies that his definition 
holds when B and C are interchanged, it would follow that 


Pr[Z(A) 1(C)|1(B)] = Prlt(A) 1(C)] , 
and summation over I(C') would yield 


Pr{I(A)|I(B)] = Pr[I(A)]. 
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Hence 


Pr[I(A) I(B) 1(C)} 


PrlI(A) I(B)|1(C)} Pr[Z(C)] 
= Prf{I(A)1(B)] Pr[Z(C)] 
= Pr[{J(A)|J(B)] Pr[Z(B)] Pr[Z(C)] 


= Pr{I(A)] Pr[Z(B)] Pr[Z(C)) , 


which is the usual definition of independence. 

The rest of this section is devoted to seven propositions’, five corollar- 
ies and a further definition. The first proposition states (in modern ter- 
minology) that, if {E;} is a sequence of mutually exclusive events, then 
Pr (UF;] = )* Pr [£;]. Bayes was apparently the first to state this fact®. 

Propositions 3 (and its corollary) and 5 require comment, and are ac- 
cordingly given here: 


Proposition 3. The probability that two subsequent events will 
both happen is a ratio compounded of the probability of the 1st, 
and the probability of the 2d on supposition the 1st happens. 
[p. 378] 

Corollary. Hence if of two subsequent events the probability of 
the lst be a/N, and the probability of both together be P/N, 
then the probability of the 2d on supposition the Ist happens 
is P/a. [p. 379] 

Proposition 5. If there be two subsequent events, the probability 
of the 2d b/N and the probability of both together P/N, and it 
being 1st discovered that the 2d event has happened, from hence 
I guess that the lst event has also happened, the probability | 
am in the right is P/d. [p. 381] 


At first sight the Corollary to Proposition 3 and Proposition 5 appear to 
be saying the same thing. Thus if #, and EF are two events (£ preceding 
Hy in time), one might be tempted to phrase these two results in modern 
notation as 


Pr [E2 | Ey] =r [Fy NM Ey] /Pr [E41] 
Pr [Ey | Ey] = Pr [Ey ‘al Eo] /Pr [Eo] ‘ 


But since it is “well-known” and “universally accepted” that “the timing of 
events is irrelevant to the concept of conditional probability” (Shafer [1982, 
p. 1076]), one might well be perplexed at Bayes’s deliberateness’. Shafer 
(op. cit.) has forcefully argued that while an argument using rooted trees 
can establish the validity of Bayes’s Corollary to Proposition 3, such an 
argument fails to establish Proposition 5. Since the latter in turn is crucial 
in the proof of Proposition 9, Shafer’s thrust is to the very heart of the 
Essay. 
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However, if we view Bayes’s fifth definition in terms of subjectively deter- 
mined values of expectations®, on Shafer’s own admission “the fifth propo- 
sition would then become merely a subjective version of the third” [op. cit. 
p. 1086]. 

In the fifth proposition Bayes introduces as a new factor the order in 
which we learn about the happening of the events. Shafer [1982] concludes 
that this result 


seems unconvincing unless we assume foreknowledge of the con- 
ditions under which the discovery of B’s [the second event’s] 
having happened will be made. [p. 1080] 


Such foreknowledge I believe obtains in cases in which Bayes uses this re- 
sult, and I believe therefore that it is correct — but let the reader of the 
Essay decide for himself’. 

The other propositions of this Section do not seem excitatory: the defi- 
nition following Corollary 2 to Proposition 6 has already been mentioned 
(see §4.2). 


4.4 The second section 


An examination of the relevant propositions shows that Bayes states the 
results of this section!® sometimes in terms of “the probability that the 
point o should fall [in a certain interval]” and sometimes in terms of “the 
probability of the event M [is in a certain interval]”!!. Here, as in §3.4, os 
denotes the line on which the first ball W comes to rest when it is rolled: 
the resting of the second ball O between AD and os — see Figure 4.1 — 
after a single throw is called the happening of M in a single trial. Thus 
Proposition 8 and its corollary fall in the first category, Proposition 9 falls 
in both categories, its corollary falls in the second, and Proposition 10 is 
framed in terms of the probability of the (an?) unknown event. 

As Edwards [1978] has noted, these two methods of formulation are, on 
Bayes’s assumptions, identical: however, even if the first ball is not uni- 
formly distributed, the distribution of the probability will still be uniform. 
This fact may be demonstrated as follows: suppose the first ball to have 
the distribution dF(-). Then, by the second part of Bayes’s postulate, the 


associated probability is 
G= i dF (x). 
0 


Thus dé = dF(z), and @ has a uniform distribution’*. Edwards [1974] 
makes the reasonable deduction that 


probably what happened was that Bayes realised he would need 
to postulate a uniform prior distribution in order to solve his 
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FIGURE. 4.1. The ball W is thrown on the square table ABC'D and comes to 
rest on the line os. A second ball O is then thrown onto the table, its resting on 
any toss between AD and os being the happening of M. 


main problem, and so generated one in his model for that prob- 
lem. Then he realised that rolling subsequent balls would give 
the probabilities he wanted for success and failure, but he failed 
to notice that this would be so even in the event of a non- 
uniform table. [p. 46] 


If we interpret all the propositions from the point of view of the prob- 
ability of the first ball’s being in a certain interval, then, with the further 
assumption of a non-uniform table, the only dienges necessitated are the 


replacements of the limits 6; and 62 respectively by the integrals i dF (x) 


and f dF (az), where z; and zg are the limits of the interval within which 


0 
the ball lies. 
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Edwards [1978, p. 117] points out that even if the table is not uniform, 
the corollary to Proposition 8 is still valid. Indeed, suppose one, and then 
a further n (distinct) values are drawn from a distribution. Denoting by 
“success” (S')) the event that one of the n values is less than the first one, 
and by “failure” (F’) the event that one of the n values is greater than the 
first (the respective probabilities now being @ and 1 — @), then, assuming 
that the values are independently chosen, we have 


Pr [x S’s and y F’s | 6] = (")erca — 6) , 


where «x+y =n. Hence 
1 in 
Pr [x S’s and y F’s] = / ("arc —6)¥d0=1/(n+1). 
0 

There is nothing in Bayes’s Essay to say that the square table ABCD is 
of unit area. This “normalization” has in fact been carried out in the state- 
ment of the results in Chapter 3: it might, however, be of some interest 
to discuss the formulation in more detail. Thus on rewriting the results of 
this Section in a more modern notation than that adopted by Bayes (and 
not assuming ABC'D to be of unit area) we obtain (see Figure 4.1) the 
following: 
Lemma 1. Pr[b<o< fl=(f—6)/AB. 
Lemma 2. Pr[Min a single trial | W] = Pr[1 success | W] = Ao/AB. 


Proposition 8. Let y = Ez?r?, where & = Ee Then 


a 
Prib<o<f & p.al= | E2?r? dx/area ABCD . 
b 


(This proposition will be discussed in more detail later in this section.’*) 
From Bayes’s Essay [p. 388] we have 
y = bm/AB, z = Ab/AB, r= Bb/AB. 
B 
Corollary. Pr[A<o<B & p,ql= J Ex?ridzf/area ABCD . 
On p. 393 of the Essay it is pointed out (in a reference to “art. 4”, which 


in turn can be found on p. 398) that (in essence) in the case of the unit 
square this corollary yields 1 /(n + 1), independent** of z. 
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Proposition!® 9. 


Pr[Ab/AB < P(M) < Af /AB | p,q] 


f B 
/ Bar! dz i, Bz? r? dz , 
b A 


where P(M) denotes the probability of M. 


Prib<o<flp,q] 


Corollary. 


oO B 
Pr[Ab/AB < P(M) <ol pa] = f paretds | | Ez? ri dz . 
b A 


Proposition 10. Let N be an “unknown event” with probability P(N). Then 


t H 
Pr[Af /AH < P(N) < At/ AH | p,q] = Ex r! dx /J Bari dz . 
f A 
I now propose to examine the “transliteration” of Proposition 8 in more 
detail. Notice firstly that 


Priti<a<az, & pal= [fal 2)fa)de. 


Recalling that f(x) is uniform here, and that f(p,q | «) = Ez?(1—- 2)!, 
where # = Cao) we obtain 


xz 


Pria,<2r<ao & pal= | Ee? (1— 2)! dz , 


v1 


the usual result for the unit square. Now let x = y/B in the integrand. 
Then 


2 


B 
Pr{z,;<2<aq & p,ql= i E(y/B)? (1 — y/B)! dy/B 


Ly 


I| 


f 
/ HG BAS) BY Bay Whee bee: FS aap 


lI 


/ ‘Bays 


where z = E(y/B)?(1—y/B)!. Now z being bm/B (see Proposition 8), the 
integral of z/B from b to f is not the area under the curve in Figure 4.1. 
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O 


H ; ; A 


FIGURE 4.2. The figure used by Bayes in the proof of his Proposition 10. 


This area is in fact 


f 
| (h/B) dy 


f 
B/ zdy 
b 


(because the height h = Bz)*®. Also h is such that 


B 1 
/ hdy=t= [ zdy=1 
7) 0 


f 
ri hdy 
b 


Thus 
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Hence finally 


Priz;<2<2q & p,q) = (1/B") x area under curve from b to f 


j 
as?) | hdy 


(the latter integral “being” the usual “area under a curve” one)‘’. This 
is Bayes’s result, of which the remaining results are fairly obvious conse- 
quences. 

We see, then, that there is no loss of generality in considering the square 
table as being of unit area. 


4.5 The postulate and the scholium 


That Bayes himself presented an argument in defence of (or, better, as 
justification for) his postulate, although apparently generally ignored, is a 
fact that has actually frequently been emphasized. One of the most recent 
to stress this point was Stigler [1982a, p. 250], [1986a, p. 127 et seqq.], and 
before him we find the point made by Molina [1930, pp. 382-383], [1931, 
§IV], Savage [1960] and Edwards [1974, p. 47]. 

The positioning of the scholium should be noted: its appearing after the 
corollary to Proposition 9 but before Proposition 10 perhaps lends weight 
to our earlier assertion that the proposition that provides the answer to 
Bayes’s problem is the tenth and not the ninth. 

The scholium may be paraphrased as follows: from Proposition 9 (writes 
Bayes) it is clear that, given the number of times the event M happens and 
fails in a certain number of trials, “one may give a guess whereabouts 
it’s probability is, and, by the usual methods computing the magnitudes 
of the areas there mentioned, see the chance that the guess is right” 
[p. 392]. This same rule is to be applied to an event about whose probability 
we are completely ignorant prior to any trials being made; for “concerning 
such an event I have no reason to think that, in a certain number of trials, 
it should rather happen any one possible number of times than another” 
[p. 393]. This being so, one may reason that its probability was at first 
“unfixed”, and then determined in such a way “as to give me no reason 
to think that, in a certain number of trials, it should rather happen any 
one possible number of times than another” [p. 393]. But this is exactly 
the case of the event M (see the corollary to Proposition 8). “Hence the 
model of a uniform prior distribution for p represents complete absence of 
knowledge about p” [Edwards 1978, p. 117]. Finally, Bayes writes!® 


In what follows therefore I shall take for granted that the rule 
given concerning the event M in prop. 9. is also the rule to 
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be used in relation to any event concerning the probability of 
which nothing at all is known antecedently to any trials made 
or observed concerning it. And such an event I shall call an 
unknown event. [pp. 393-394] 


To complete Bayes’s argument successfully a converse property must 
needs be established: viz., none other than the uniform distribution for p 
has the property of the corollary to Proposition 8. As Murray [1930] has it 


the assumption “all values of p are equally likely” is equivalent 
to the assumption “any number « of successes in n trials is just 
as likely as any other number y, x <n, y <n”. [p. 129] 


In his elegant note Murray verified that part of this quotation that was not 
proved by Bayes. A shorter proof than his would be provided by noting that 
the uniform distribution does yield the appropriate sequence of moments, 
and then using the uniqueness theorem for moment generating functions!9. 

The argument may be put as follows: our aim is the determination of 


the (unique) cumulative distribution function F(-) that satisfies 


[ (C)ra-mrtar@ =. (1) 


where n € IN and z € {0,1,... ,n}. For x =n, equation (1) becomes 


, (2) 


od 
"dF (p) = 
| p" dF (p) re 


and hence all moments of F are known. 
Now it is well known that the Hausdorff moment problem 


1 
Deke i: £" T(t) 
0 


has a solution if and only if all the differences A* z, are non-negative, where 
k,n € INU {0} and where 


ls. SS Hes 


k 
k 
Atin = YO (“)(-Uanass FEC.) 

7=0 
(see Shohat and Tamarkin [1970, p. 9]). In the case under consideration 
here, with ys, defined by the common value in (2), we have 
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7 i ‘p(1— py dF(p) 
> Ge 


Thus our version of the Hausdorff moment problem has a solution F’, and 
by Theorem 203 of Hardy [1949/1991], this solution is unique??. Moreover, 
since the moments of F are all equal to 1/(n + 1), it follows that F' is the 
distribution function of a random variable having a uniform distribution??, 
i.e. 


0, ps0 
F(p)=<¢ p, 0<p<il 
Ly pod 


Using f generically to denote a probability density function, we have in 
general 


/ f(alp) dF(p) = f(z). (3) 
On taking 


n ie n—-zr 
s(eln) = (")p (l—p)"~*, x €{0,1,...,n}, 
we find on combining (1) and (38) that 


f(z) =1/(n+1): 


that is, X is unconditionally distributed uniformly over {0,1,...,n}. Thus, 
as we have in fact already seen in §3.6, 


P~U((0,1)X ~U({0,1,...,n}). (4) 


It is perhaps worthwhile to draw attention in passing to the dependence 
of the equivalence in (4) on the integrand in (1). For example, if we suppose 
that 


flele) = (771) a-we, ce fanth...) 


(i.e. a negative binomial distribution), then, with F uniform, 


A (=> 1 )e"a —p) "dp = eT 


and 
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as before. However 
f(z) =n/x(e@+1), cE {n,n+],...}, 


and so X is no longer (unconditionally) uniformly distributed??. 

A conclusion that may be drawn from these remarks is the following: 
had Bayes changed his problem slightly, considering the number of trials 
needed before the occurrence of the nth success (with n fixed) rather than 
fixing the number of trials, he would have been unable to switch from a 
uniform distribution on P to one on X or vice versa?*: this could well have 
spared us much of the controversy we have been exposed to (for decades, 
if not for centuries) on “Bayes’s Postulate”. 

Several writers have of course disparaged Bayes’s argument7*: it might be 
of interest to look at the discussion presented by Hacking [1965], according 
to whom Bayes argued as follows: 


(i) Before any trials on the billiard table?®, and before the point 
o is discovered, there is no reason to suppose M will happen any 
number of times rather than any other possible number — and, 
he might have added, there is no reason to prefer any value of 
P(M) [the probability of M] to any other’®. (ii) Exactly the 
same is true of the event F’, in the case that no parent set-up is 
known. (111) Betting rates should be a function of the available 
data: when all the information in the two situations is formally 
identical, the betting rates must be identical. (iv) In all that 
matters, the data in the case of EH and M are identical. (v) 
The initial distribution of betting rates for P(M) is uniform: it 
assigns equal rates to equal intervals of possible values of P(M). 
Therefore, (vi) this should also be the initial distribution of 
betting rates for P(E). [pp. 199-200] 


To pin-point the fallacy (as he sees it) in Bayes’s reasoning, Hacking poses 
the following dilemma: 


Interpretation A: (v) does not follow from (i) directly, but is the 

consequence of the fact that the table is so made and levelled, 

that the long run frequency with which the ball falls in any 

area is equal to the long run frequency with which it falls in 

any other area; we infer (v) from this fact plus assumption (3) 

[viz. when the chance (long run frequency) of getting outcome 

Ff on some trial of kind K from some set-up X is known to be 

p, and when this is all that is known about the occurrence of 

Ff on that trial, then the fair rate for betting on EF should be 

pla pl: 

Interpretation B: (v) does follow from (i) directly. [p. 200] 

Although “most readers since the time of Laplace have favoured B” 

[p. 200], Hacking believes that Bayes probably meant A — otherwise, why 
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would he have taken such pains in his Essay to compare & to M? If indeed 
(v) follows from (i) directly (as suggested in Interpretation B), then (vi) 
follows from (ii) directly, and there would then be no need for any mention 
of M. 

However, under Interpretation A the argument is fallacious. If Hacking’s 
assumption (3) and certain facts about frequency are required for (v), then 
data about M must be used that are not available for #, and so (iv) must 
be false (which of course means that the demonstration itself is false). 

Interpretation B is similarly discredited, since lack of reason for sup- 
posing P(M) to be in one short interval rather than another of the same 
size should not entail that the betting rate on equal intervals should be in 
proportion to their size. As an illustration of his point Hacking cites the 
well-known (though perhaps somewhat shabby) example due to Fisher, 
in which the assumption that nothing is known about P(M) leads to a 
similar assumption about arcsin P(M),and hence to the observation that 
“betting rates should be proportional to angular size” [Hacking 1965, 
p. 200]. Of course, as Edwards (1978, p. 118] notes, “such a change would 
upset the equal probabilities for all the values of a [the number of suc- 
cesses]”. Moreover, it might be disputed whether ignorance of P(M/) implies 
ignorance of arcsin P(M): indeed, if our interest is in P(M), why should 
we be at all concerned about whether or no the distribution of arcsin P(M) 
is uniform? 

I believe that Bayes probably introduced his “table and balls” model for 
one of two reasons: (a) merely as an example, or (b) because he first gave 
the result for an unknown event and then added his model. The latter in- 
terpretation is, I believe, supported by Price’s introduction. Salient points 
from the second paragraph [pp. 370-371] of the latter are the following: 


(a) Bayes was originally concerned with finding a rule by whose use the 
probability of an unknown event EF could be obtained. 


(b) This rule, it appeared to him, must be “to suppose the chance the 
same that it [i.e. the probability of the unknown event] should lie 
between any two equidifferent degrees” [p. 371]. 


(c) The quaesitum would then follow by “the common method of pro- 
ceeding in the doctrine of chances” [p. 371]. 


(d) Bayes in fact gave a proof (suppressed by Price) on these lines. 


(e) Second thoughts suggested that not all might regard the postulate on 
which he argued as reasonable. 


(f) Bayes therefore “chose to lay down in another form the proposition 
in which he thought the solution of the problem is contained, and 
in a scholzum to subjoin the reasons why he thought so, rather than 
to take into his mathematical reasoning any thing that might admit 
dispute” [p. 371]. 
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A discussion of the scholium and the postulate would be incomplete 
without mention of Stigler [1982a], in which paper (contra Hacking) it is 
asserted that “Bayes’s actual argument is free from the principal defect 
it has been charged with” [p. 250] (see also Stigler [1986a, pp. 126-129)]). 
Stigler’s discussion?’ runs as follows: denoting by X the number of successes 
inn = p+q trials, we may rewrite the corollary to Proposition 8 and the 


footnote on p. 393 as 
1 
| (") 2°(1 —x)"~? dx 
9 ‘<P 


1/(n+1) 


for all p € {0,1,...,n}. In terms of this discrete uniform distribution as the 
marginal distribution of X, Stigler [1982a] constructs Bayes’s reasoning as 
follows: 


(i) For the table, Pr[X = p] = 1/(n + 1) for all p. 


Pr [X = p] 


lI 


(ii) In the case of what Bayes describes as “an event concerning the prob- 
ability [x] of which we absolutely know nothing antecedently to any 
trials made concerning it” [pp. 392-393] [i.e. before X is observed}, 
one should argue that “concerning such an event [success] I have no 
reason to think that, in a certain number [n] of trials, it should rather 
happen any one possible number of times than another” [Bayes 1763a, 
p. 393]. (i.e. Pr[X = p] is constant). 


(iti) Since Pr [X = p] = 1/(n+1) both for the table and for any application 
in which we are in a state of absolute ignorance, the situations are 
parallel, and x must therefore have a uniform distribution not only 
on the table, but also in the application. That is, Pr[X = p| constant 
implies x is uniform. 


The second step is characterized by Stigler [1982a, p. 253] as “a very 

distant cousin” of the principle of insufficient reason. Three arguments are 
advanced in support of this position. 
Argument 1. Suppose that before X is observed, we “absolutely know noth- 
ing” about x. If Pr |X = p] were not constant, suppose that there were to 
exist p and p* such that Pr[X = p*] > Pr[X =p]. A greater expectation 
would then be attached to p* than to p, and a future bet (“expectation”, 
in Bayes’s terminology) that p* would occur would be of higher value than 
a similar one that p would occur. But if we expect one value of X rather 
than another, then we are not in a situation where absolutely nothing 1s 
known about z, 


for X/n is an estimate of [x], and we should not describe our- 
selves as being in a position where we expect this estimate to 
be one value rather than another. [Stigler 1982a, p. 253] 
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Argument 2. Recalling that Bayes’s definition of probability was as an a 
priort expectation, we note that his reluctance to postulate a uniform dis- 
tribution for z was not a sign of an unwillingness to speak of a priori prob- 
abilities. Rather, the specification of an @ prior: distribution was removed 
from “the forever unobservable” x and placed “on the ultimately observ- 
able X” [Stigler 1982a, p. 253]?®. Thus the second step “makes peculiarly 
good sense in the context of Bayes’s unusual definition of probability (as 
an expectation)” [Stigler, loc. cit.]. 

Argument 3. The second step is much more restrictive than the usually 
invoked principle of insufficient reason: for if knowing absolutely nothing 
necessitates our taking Pr [X = p| = 1/(n+1), very few applications will be 
found in which this requirement is met. Moreover, the argument is strongly 
linked to the binomial model?® (cf. my earlier remarks on the negative bi- 
nomial distribution). 

The third step in Stigler’s reconstruction of Bayes’s argument, namely 
Pr [X = p| constant implies x is uniformly distributed, while being “intu- 
itively plausible at Bayes’s time” [Stigler 1982a, p. 253], needs verification. 
As we have already indicated, however, knowledge of the first n moments, 
for every n, of a distribution on [0,1] will uniquely determine the distri- 
bution. Since Bayes’s “certain number of trials” is vague, and since the 
statement about Pr[X = p] is a priori, “we may be charitable to Bayes 
and assert that (perhaps inadvertently) he was not actually in error on 
this point” [Stigler 1982a, p. 254]. 

Stigler [1982a, p. 253] and [1986a, p. 129] notes that his interpretation 
of Bayes’s argument shows that, for any strictly monotone function f, 


Pr[X = p] = 1/(n+ 1) > Pr[f(X) = f(p)] = 1/(n + 1). 


Thus our knowing nothing about X is equivalent to our knowing nothing 
about f(X), and this observation shows that Bayes’s argument is in fact 
free of the objection raised to it by Fisher and others.°° 

Geisser [1988] proposes three possible versions of Bayes’s result. In the 
first of these a sequence [xe of independent and identically distributed 
random variables taking on values in {0,1} is considered, with 


Pr(X; =1|6]=0=1—Pr[X; =0| 6 , 
Setting R= Soar X;, we easily find that 
N 


Tr 


Pr[R=r|6]= ( ora =o)" 4 


and hence 
pe |Frae’ Ge)" 


This, the “Received Version”, is contrasted with the “Revised Version” 
given by Stigler, which we have already discussed. 
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In the third version, labelled as “Stringent” by Geisser [1988, p. 150], it 
is supposed that the abscissa of the point at which the ball initially rolled 
comes to rest is a random variable Y. The actual value y of Y is then to be 
inferred from N further rolls (of a second ball), it being known how often 
the second ball comes to rest at a position with abscissa less than or equal 
to y. Assuming that these rolls of the second ball are independent, we have 


Ply) =1 
and 
Pri r | y| = ("a= ay : 


Hence 
py |r)xy"(L—y)X~* , 


an expression independent of any parameters. 


46 The Appendix 


In his appendix “Containing an Application of the foregoing Rules to some 
particular Cases”, Price discusses a number of examples illustrating (or 
purporting to illustrate) the use of the major result of the Essay. I propose 
to consider this appendix in some detail. 

The first illustration runs as follows: 


Let us first suppose, of such an event as that called M in the 
essay, or an event about the probability of which, antecedently 
to trials, we know nothing, that it has happened once, and that 
it is enquired what conclusion we may draw from hence with 
respect to the probability of it’s happening on a second trial. 
The answer is that there would be an odds of three to one for 
somewhat more than an even chance*! that it would happen on 
a second trial. [p. 405] 


Price arrives at his solution by a direct application of Rule I (see §3.4), and 
then states 


which shews the chance there is that the probability of an event 
that has happened once lies somewhere between 1 and * or 
(which is the same) the odds that it is somewhat more than an 
even chance that it will happen on a second trial. [p. 405] 


Now it is, I think, possible (though perhaps incorrect) to interpret? 


Price’s question as requiring an answer given by the rule of succession™ (a 
formula obtained by Laplace in 1774), in terms of which the probability of 
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a second occurrence of M is given by 


1 1 ) 
/ ade / | ade = s . 
0 0 


This interpretation however does not take account of Price’s requirement 
that there be “more than an even chance that it will happen on a second 
trial”, but this can be incorporated into the solution by taking cognisance 
of Problem IV, pp. 180-183, of Condorcet’s** Essai sur l’application de 
Vanalyse a la probabilité des décisions rendues a la pluralité des voix of 
1785. In a slightly different notation to that to be used in our discussion 
of this problem in Chapter 6, let S; denote the occurrence on the 7-th trial 
of Price’s event M and let F,, denote the probability that P(M), the 
probability of M, lies between r and s (with r < s). Then by Condorcet’s 
solution, we have 


1 1 
(a) Pr ya | 51] = [ sa / f cdo se 
1 1 
(b) Pr [S21 Si & Fy, = ae | | gadis s 
3 3 


1 1 
(c) Pr [2 |i] =f ade | | bi fea ae 
1 i 
(d) Pr |S, & Puls) = [std / [ede = f 


Here part (c) is Laplace’s solution, while (a) yields the numerical value 
determined by Price — and yet there seems to be no mention of a “second 
trial” in (a)! 

However it is possible, by an appropriate interpretation, to obtain Price’s 
result from Bayes’s theory. The postulates of §2 of the Essay require that 
successes and failures be defined referentially to an initial event. Thus the 
event described by Price as having happened once plays the same réle as 
W, the first ball thrown, in the postulates. What is then required by Price is 
essentially the probability that the next throw results in a “success” (say), 
inasmuch as it falls in the interval [5, 1], the first ball having demarcated 
the lower limit of this interval. The solution is then immediately given (for 


one success) by 
1 1 
| x dz /| xz dx a 
1/2 0 4 
5 


as Price showed. This is surely the correct interpretation®®. 
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Consideration is then given to the odds on the event’s happening once 
again after it has happened twice, thrice, ..., p times. Price’s answers — 
odds of 2?+! — 1 to 1 in the last case — are given similarly by considering, 


in general, 
1 1 
| at de | | x? dx =1—1/2?*" | 
1/2 0 


and while this is the solution provided by Proposition 10, it is perhaps 
unfortunate to interpret it®®, as Price does, as the odds “for more than an 
equal chance that it will happen on further trials” [p. 405]. 

Considering next the case of an event that is only known to have hap- 
pened ten times without failing, Price supposes the 


enquiry to be what reason we shall have to think we are right 
if we guess that the probability of it’s happening in a single 
trial lies somewhere between 16/17 and 2/3, or that the ratio 
of the causes of it’s happening to those of it’s failure is some 
ratio between that of sixteen to one and two to one. [p. 406] 


That is, we are trying to find 
Pr[Z < x < 72 | 10 successes and 0 failures] 
or, denoting by C(E£) the “causes of E”, 
Pr[2 < C(£)/C(E) < # given 10 successes and 0 failures] . 


The former formulation is exactly that of Bayes’s Proposition 9: the latter 
(in terms of causes) has no parallel in the Essay. Price once again uses 
Bayes’s method correctly, obtaining the answer 0.5013 &c. 

In discussing his next example, that concerned with the throwing of a 
die, Price argues in such a manner as to confirm our second interpretation 
of his first illustration. 


It will appear, therefore, that after the first throw and not be- 
fore, we should be in the circumstances required by the condi- 
tions of the present problem, and that the whole effect of this 
throw would be to bring us into these circumstances. That is: 
the turning the side first thrown in any subsequent single trial 
would be an event about the probability or improbability of 
which we could form no judgement, and of which we should 
know no more than that it lay somewhere between nothing and 
certainty. With the second trial then our calculations must be- 
gin. [p. 407] 


Some numerical work follows. 
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Attention is next given to the famous problem of the probability of the 
sun’s rising?’. This solar problem, often ignorantly supposed to have origi- 
nated with Laplace, is in fact to be found, albeit but vaguely expressed, in 
various forms in Hume’s writings®®. It indeed provides a good illustration 
of Edgeworth’s [1884b] statement that 


the much decried method of Bayes may be employed to deduce 
from the frequently experienced occurrence of a phenomenon 
the large probability of its recurrence. [p. 228] 


In this problem (an entirely similar argument to that given in the die- 
tossing example mentioned above being advanced) Price explains that the 
first sinking of the sun a sentient person who has newly arrived in this 
world would see, leaves him “entirely ignorant whether he should ever see 
it again” [p. 409]. As Pearson [1978] has it 


The first experiment counts nothing because you must know 
there is a sun or ared ball in a bag before you can argue about 
the repetition of drawing red balls. [p. 368] 


Thus, according to Price, 


let him see a second appearance or one return of the Sun, and an 
expectation would be raised in him of a second return, and he 
might know that there was an odds of 3 to 1 for some probability 
of this. [p. 409] 


This may be expressed symbolically as 


1 1 
Pr[(1/2) < « <1| one return] = i x dx /{ zdz = 3/4. 
1/2 0 


Next 


let it be supposed that he has seen it return at regular and 
stated intervals a million of times®’. The conclusions this would 
warrant would be such as follow — There would be the odds of 
the millionth power of 2, to one, that it was likely that it would 
return again at the end of the usual interval. [pp. 409-410] 


As Zabell [1988a] has pointed out, there is a slight error here, in that n, the 
number of occurrences of the event in question, being 1,000,000, the odds 
should be 2':9°9,°°! to 1 on a reappearance. The appropriate exponent of 
2 is (n+ 1) —1i.e. the number of risings of the sun — and not n, which is 
the number of returns of the sun. It was possibly a hasty reading of this 
section of the Appendix that was responsible for Buffon’s incorrectly giving 
the odds as 2”~1 : 1, where m is the number of risings (see $5.8). 
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This example is clearly analogous to his first illustration, though it should 
be noted that while Price is correctly applying Bayes’s results, a tendency 
to apply them to future events seems to be making its presence felt. It is, 
of course, quite possible that Bayes intended his solution to be applicable 
to the case of “a single throw” after experience: however, this is nowhere 
explicitly stated in the Essay, and, as we shall see in Chapter 7, Bayes’s 
result is in accord with not interpreting this “single trial” in the predictive 
sense (indeed, the actual statements of his ninth and tenth propositions 
are in the past tense). However, it is not obvious from the first quotation 
in §4.5 above (“In what follows...”) that Bayes intended his result to be 
used only in a retrodictive sense: in fact, Price writes quite explicitly in his 
introductory letter that Bayes’s intent originally was to find the probability 
of an event given a number of occurrences and failures. 

It is also worth noting that Price passes without any qualms from the 
application of probability in games of chance to its use in connexion with 
physical phenomena. The distinction between chance (or randomness) and 
probability (an attribute of opinion) had been observed certainly until the 
late seventeenth century (see Shafer [1978]), such a distinction of course 
having been deliberately avoided by Bayes who, as we have already seen, 
carefully identified the two. Whether the notion that is applicable in the 
case of the tossing of a die is also applicable in the case of natural phenom- 
ena could be debated: some would reject the analogy, while others might 
accept it in connexion with matters such as birth ratios but refuse to coun- 
tenance it — or at least query its fitness — in matters such as the example 
discussed by Price here*®. 

Price next turns his attention “to cases where an experiment has some- 
times succeeded and sometimes failed” [p. 411]. To illustrate the general 
ideas he considers the drawing of blanks and prizes from a lottery, fixing 
his attention on what is essentially 


Pr[z, < x < 22 | p blanks and q prizes drawn] , 


where x is the (true?) proportion of blanks to prizes in the lottery. Once 
again this is a straightforward and correct application of Bayes’s results. 

Price then passes some remarks on the probability of causes*’, and 
draws towards a conclusion by noting that “The foregoing calculations 
further shew us the uses and defects of the rules laid down in the essay” 
[p. 417]. These defects seem to be that the second and third rules “do not 
give us the required chances within such narrow limits as could be wished” 
[p. 417]. However, these limits become narrower as q increases with respect 
to p, while the exact solution is given by the second rule when p = q. 


These two rules therefore afford a direction to our judgement 
that may be of considerable use till some person shall discover 
a better approximation to the value of the two series’s in the 
first rule. [pp. 417-418] 
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A footnote (possibly added in proof?) now states that Price had found 
an improvement of the approximation in the second and third rules, by 
showing that 

20 /(1 + 2B a? b! + 2E a? b4/ n) 


“comes almost as near to the true value wanted as there is reason to desire, 
only always somewhat less” [p. 418]. This too will be reconsidered later. 

In his introduction to the Essay Price had commented on de Moivre’s 
rules 


to find the probability there is, that if a very great number 
of trials be made concerning any event, the proportion of the 
number of times it will happen, to the number of times it will 
fail in those trials, should differ less than by small assigned 
limits from the proportion of the probability of its failing in 
one single trial. [pp. 372-373] 


No person, to the best of Price’s knowledge, had yet shown how to solve 
the converse problem, viz. 


the number of times an unknown event has happened and failed 
being given, to find the chance that the probability of its hap- 
pening should he somewhere between any two named degrees 
of probability. [p. 373] 


Therefore, de Moivre’s work was not sufficient to make consideration of 
this point unnecessary. Price now concludes the Appendix by noting that 


what most of all recommends the solution in this Essay is, that 
it is compleat in those cases where information is most wanted, 
and where Mr. De Moivre’s solution of the inverse problem can 
give little or no direction; ] mean, in all cases where either p or 
q are of no considerable magnitude. [p. 418] 
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In view of the important role played by Bayes’s Theorem in modern sub- 
jective probability, it might be of no little interest briefly to consider the 
view of de Finetti, a leading exponent of subjective probability, on this 
result. These views are expressed in §9.2 of his Probability, Induction and 
Statistics of 1972. 

After pointing out that Bayes’s formulation of his problem is, strictly 
speaking, unsatisfactory, de Finetti singles out the following assumptions 
of the Essay for detailed examination: 


(1) The “unknown probability” p has probability dz of being 
comprised in any interval (x, z+ dz) in (0,1). 
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(2) The events considered are independent under each hypoth- 
esis p= x as to the value of p. 


(3) Therefore, after the observations, the probability that p falls 
between z and x + dz becomes K2™(1 — 2)"~™ dz. 
[1972, p. 158] 


(Here m and n — m denote respectively the numbers of favourable and 
unfavourable events that have already happened, while K is a normalizing 
constant.) 

Noting that (3) admits of dispute only inasmuch as it concerns “the ex- 
tent of the domain of applications, which can be narrowed if one wishes 
to confine the notion of probability to a restrictive meaning” [p. 158], de 
Finetti goes on to point out that the true meaning of the hypothesis in (2) 
may be clarified to remove reference to the “unknown probability” p. 

Turning his attention next to (1), de Finetti notes that a reformulation 
of the problem might permit the removal of the meaninglessness of the 
phrase “unknown probability”: moreover, this “Bayes’ postulate” is “not 
necessary to the expression of the problem in terms of Bayes’ theorem” 
[p. 159] (as we have in fact already seen). The vagueness in the phrase 
“knowing nothing” leads de Finetti to conclude [p. 159] that the postulate 
is either a tautology (if “knowing nothing” means that a uniform distri- 
bution is to be attributed to p), or else a nonsense (if “nothing” is taken 
literally, for in this case knowing nothing about £; will mean knowing noth- 
ing about £;E;, and hence p? will have to have a uniform distribution also). 

After some comments on Laplace’s more general theorem (in which the 
initial density need not be uniform), de Finetti recalls some results and ap- 
plications from the Essay*?: further details may be found on pp. 160-162 
of his book cited above. 


Miscellaneous 


Investigations from 1761 
to 1822 


As by successive tradition from our fore- 
fathers we have received tt. 


Marcus Aurelius Antoninus. 


Of the 54 references cited in Todhunter’s “Chronological List of Authors” 
[1865, pp. 619-620] as contributing to probability theory from 1761 to 1822 
(and excluding Condorcet and Laplace) only some twelve make any con- 
tribution to our present topic. The writings to be discussed in this chapter 
are given, as in others, in order of publication; but once an author is cited, 
any further pertinent publications of his (although they may well have been 
written after those of another author not yet cited) will be discussed in the 
same section. 


5.1 Moses Mendelssohn (1729-1786) 


Moses Mendelssohn! (Moshe ben Menachem, Moshe miDessau), the son 
of Mendel Heymann and the grandfather of the arguably more famous 
Jakob Ludwig Felix Mendelssohn-Bartholdy, included an essay on prob- 
ability in his Philosophische Schriften. This appears under the heading 
“Ueber die Wahrscheinlichkeit” as chapter IV of the second volume of the 
second edition? of 1777. 

The only pertinent point from this essay is the following: if an event A 
has occurred (almost) simultaneously with an event B on n occasions, the 
probability of a causal connexion is n/(n +1). No argument for this value® 
is given, though its use is illustrated by an example concerning repeated 
onsets of giddiness after drinking coffee. This example is in fact not as 
strange as it might at first seem: Mendelssohn belonged, in the 1750’s, to a 


68 5 Miscellaneous Investigations: 1761 to 1822 


closed society in Berlin whose members met regularly to talk, drink coffee 
and discuss learned matters’. 

In a paper on upper and lower probabilities, Dempster [1966, p. 369] 
shows that, when T of the first n sample individuals are observed to fall in 
a certain category, the upper and lower probabilities (P and P respectively) 
that the next sample individual will fall into that category are given by 


P=(T+))/(n+1), P=T/(n+1). 


On replacing T by n we obtain Mendelssohn’s value from P. 

One might note here that the invariance theories of both Harold Jeffreys 
[1961] and Perks [1947] lead to a prior density proportional to \/p(1 — p). 
This leads in turn to the posterior expectation (T + 1/2)/(n + 1) (in our 
notation), which is midway between the values P and P given above — and 
which is also, as Good has noted [1965, pp. 18-19], a compromise between 
the maximum likelihood estimate T'/n and the result (T +1)/(n+2) given 
by Laplace’s rule of succession?. 

Referring (incorrectly) to “the theorems of Bayes and Laplace” (by which 
is apparently meant the rule of succession), Todhunter [1865, p. 617] notes 
that the probability that an event, which has already happened n times, 
will happen a further time, is (n+ 1) /(n + 2): he comments on the close 
agreement, for large n, of this result with that obtained by Mendelssohn, 
but the coincidence is more apparent than real, for there is no sign in 
Mendelssohn’s essay of any deep knowledge of probability (or expertise 
therein)®. We must conclude with Todhunter (loc. cit.) that “we cannot 
therefore consider that he [i.e. Mendelssohn] in any way anticipated Bayes”. 


5.2 Johann Heinrich Lambert (1728-1777) 


Lambert published a number of works on the application of probability’, 
in particular in demographic statistics and the theory of errors: indeed, the 
latter term is possibly due to him, it having appeared as Theorie der Fehler 
in the Vorberichte to the first volume of his Bettrage zum Gebrauche der 
Mathematik und deren Anwendung of 1765. Lambert may thus be seen as 
a predecessor to Gauss in this field — and, for that matter, as a follower of 
Leibniz in an attempt to incorporate a probability calculus into a general 
logical system. 

Lambert’s work on inverse probability seems to be limited® to the re- 
marks appearing in the fifth chapter of the second volume of his Neues 
Organon oder Gedanken uber die Erforshung und Bezeichnung des Wahren 
und dessen Unterscheidung vom Irrthum und Schein of 1764. The chapter 
is entitled “Von dem Wahrscheinlichen” , and appears in the section headed 
“Phanomenologie oder Lehre von dem Schein”. 
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The ideas of prior and posterior probabilities are introduced with refer- 
ence to games of chance as follows: 


Die Glucksspiele haben das besonders, daf} man aus ihrer Ein- 
richtung die moglichen Falle abzahlen, und den Grad der Mog- 
lichkeit von jeden bestimmen kann. Auf diese Art wird die 
Wahrscheinlichkeit jeder Falle a priort berechnet. Es erhel- 
let aber aus erstgesagtem, das es auch a posteriori geschehen 
konnte, wenn man das Speil lange oder unendlich vielmale wie- 


derholte. [§153, p. 323] 


How the “Grad der Moglichkeit” and the “Wahrscheinlichkeit” are related 
is not spelt out, nor does Lambert state how the determination of the a 
posteriori probability is to be effected, though there seems to be a sugges- 
tion that a (finite) frequency should be used. It is perhaps interesting to 
note that a similar remark had been made earlier by Jacob Bernoulli? (and 
was to be made later by Laplace — see §7.6), viz. 


Tutissima probabilitates zstimandi via in istis est non a priori, 
seu causa, sed a posteriori seu ab eventu in similibus exemplis 
multoties observato. [Bernoulli, 1975, p. 46] 


and again 


Verum enimvero alia hic nobis via suppetit, qua queesitum obtin- 
eamus; & quod a priori elicere non datur, saltem a posteriori, 
hoc est, ex eventu in similibus exemplis multoties observato 
eruere licebit. (Bernoulli, 1713, p. 224] 


The idea of inverse probability then makes itself known. Lambert writes 


Denn so ist unstreitig, dai, wenn jede Folgen, die eine Ursache 
in vorgegebenen Umstanden nach sich ziehen muf, durchaus in 
der Erfahrung gefunden werden, der Schlu8, da sie von nichts 
anders herrthren konnen, richtig gemacht werden kann. 


[§162, p. 329] 


This seems to suggest that, if Fis the observed result and C; the ith cause, 
then 
Priel) 1 Pri G, | 1 


which follows from 
Pr[E|C,] = Pr[E] Pr(C;|£]/ Pr[Ci] 


if Pr[E] = 1 (or, as Lambert has it, Eis found completely in experience) 
and Pr[C;] = 1. There is of course no sign that Lambert had even an inkling 
of our modern definition of conditional probability, and one might indeed 
_ wonder whether he was seized of the difference? between Pr[E|C;] and 
Pr[C;|£]. 
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Some simple questions in direct probability follow next, and after this 
Lambert turns his attention to questions involving the credibility of wit- 
nesses. We shall have something to say on this matter from time to time 
during the course of the present work, and it therefore seems not inappro- 
priate to note Lambert’s views here, even though they are not of direct 
relevance to our main theme. Before citing appropriate extracts from Lam- 
bert’s work, however, it might be useful to say something about the general 
thinking on chance and probability that was current at that time. 

I have mentioned both “chance” and “probability” in the last sentence, 
and on first thought this might seem excessive. However, as Shafer [1978] 
has noted in remarking on the modern difference between aleatory and 
epistemic probability, 


Until the late seventeenth century there was a similar distinc- 
tion between chance, or randomness, and probability, which was 
an attribute of opinion. [p. 310] 


Thus probability was a measure of one’s subjective certainty, while chance 
was the sort of thing that arose in games of chance. This distinction was 
observed in Jacob Bernoulli’s Ars Conjectandi of 1713, though it soon 
became blurred, and by the time de Moivre and Montmort had written 
their influential works (The Doctrine of Chances and Essay sur d’analyse 
sur les jeux de hazard respectively) it had all but vanished??. 

According to Bernoulli [1713], probabilities were to be calculated from 
arguments: 


Probabilitates astimantur ex numero simul & pondere argu- 
mentorum, que quoquo modo probant vel indicant, rem ali- 
quam esse, fore aut fuisse. Per Pondus autem intelligo vim 
probandi. [p. 214] 


In this context the probability p of a proposition and the probability q of 
its negation are seen to satisfy the following inequalities: 


O<p<l, O<q<l, ptq<l. 


That is, the probabilities of a proposition and of its negation do not neces- 
sarily sum to 1 (Shafer [1978] refers to this as “non-additive probability” ). 
Arguments, writes Bernoulli [1713], may be either pure or mized: indeed, 


Preter hanc argumentorum distinctionem aliud quoque in lis 
discrimen observare licet, dum quedam eorum sunt pura, alia 
mizta. Pura voco, que in quibusdam casibus ita rem probant, 
ut in aliis nihil positivé probent: Mizta, que ita rem probant in 
casibus nonnullis, ut in czeteris probent contrarium rei. [p. 218] 


When it comes to the combination of arguments, Bernoulli gives separate 
rules depending on whether the various arguments involved are pure or 


5.2 Johann Heinrich Lambert 71 


mixed, that rule for the combination of pure and mixed arguments being 
later shown by Lambert to be unsatisfactory. 

As has already been mentioned, Bernoulli’s recognition of non-additive 
probabilities almost disappeared during the eighteenth century!*. The no- 
tion only reappeared in Lambert’s work, where we find, for example, the 
following 


Die Grade der Wahrscheinlichkeit, die man fur das Bejahen 
und fiir das Berneinen der Schlufsatze herausbringt, machen 
zusammengenommen, nicht immer ein Ganzes, weil ofters noch 
ein betrachtlicher Theil unbestimmt bleibt, wie wir es in dem 
angebrachten Beyspiel der SchluSketten sehen. Man hat den- 
mach allerdings dieses unbestimmten Theils Rechnung zu tra- 


gen, wenn man aus dem Grade der Wahrscheinlichkeit auf den 
Grad der Unwahrscheinlichkeit schlieSen will. (§212, p. 377] 


Shafer [1978] has shown how Lambert’s handling of probability in the syllo- 
gism led to an awareness of non-additive probabilities!*; broadly speaking, 
a minor premise in a syllogism of the first figure that is merely probable 
results in a conclusion whose probability is non-additive!’. 

Lambert’s rule for the combination of testimony runs as follows: 


Man setze zween Zeugen, die einerlei aussagen. Des ersten Glaub- 
wurtigkeit sei so beschaffen, daB er gegen 10 Wahrheiten 3 Un- 
wahrheiten und 1 Luge sagt: das ist, das man ihm in 10 Fallen 
glauben, in 3 Fallen nicht glauben, und in einem Fall des Gegen- 
theil glauben musse, wenn man die Wahrheit treffen will. Dieses 
drucken wir nun so aus 


10a+3ut+le. 
Eben so sei die Glaubwurdigkeit des andern 
12a + 5u + 2e . 


Werden nun diese Falle mit einander multiplicirt, so ist das 
Product 


120aa + 86au + louu+ lleu + 2ee + 32ae . 


Aus diesem Product wird 32ae weggelassen, weil es unmoglich 
ist, dem einem Zeugen die Aussage und dem andern das Gegen- 
theil zugleich zu glauben. Ferner wird 120aa + 86au zusam- 
mengezogen, und 206a daraus gemacht. Denn ungeacht man in 
den 86 Fallen dem einen zeugen nicht glaubt, so glaubt man 
doch dem andern. Auf gleiche Art zieht man 2ee + lleu zusam- 
men, und macht 13e daraus. Denn bei den 1leu fallt der Glaube 
auf das Gegentheil der Aussage. Demnach haben wir 


206a + 15u + 13e 
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fur die Glaubwurtigkeit eines Zeugen, der so viel gilt, als beide 
erstere zusammengenommen. Kommt noch ein dritter Zeuge 
dazu, so wird seine Glaubwurtigkeit mit der erstegefundenen 
auf eben die Art multiplicirt, um die von einem Zeugen zu 
finden, der so viel gilt als alle drei zusammengenommen. Die 
algemeine Formel ist diese: 


1. Zeuge, Ma+ Nu+ Pe 


2. Zeuge, ma + nu -+ pe. 
Beide, (Mm+Mn+mN)a+ Nn.u+(Pp+Pn+pN)e. 


Ist des einen Zeugen Glaubwirtigkeit vollstandig, so ist n = 
p = 0, denmach fallen im Product alle Glieder, u, e, weg, welches 
anzeigt, dafS die ubrigen Zeugen seine Glaubwirtigkeit, weder 
vermehren noch vermindern, weil alle ubrigbleibende Falle a 
find. Hingegen wo keines Zeugen Glaubwurtigkeit vollstandig 
ist, da kommt in der Summe von allen noch immer u und e vor, 
und folglich auch nur Wahrscheinlichkeit fur die Aussage. 
[§237, pp. 398-400] 


The formula given above for the general case may be interpreted as follows: 
suppose that the first witness must be believed in M cases, that he must 
not be believed in N cases, and that the opposite of what he says must be 
believed in P cases. This can be expressed as 


Mat+Nu+Pe. 


Similarly, letting m, n and p respectively represent the corresponding re- 
sults for the second witness, we have 


ma + nu + pe 
as the expression of his credibility. The product of these two is 
Mmaa+(Mn+Nm)au+(Mp+ Pm)ae+ Nnuu+(Np+ Pn)eu+ Poee . 


Now the term (Mn+ Nm)au is to be combined with Mmaa, for although 
one witness is not believed, the other is. Similarly, (Vp + Pn)eu is to be 
combined with Ppee, for now, although one witness is not believed, the 
opposite of what is said is to be believed. Finally the term (Mp+ Pm)ae is 
to be omitted altogether, for here the testimony is to be believed because 
of what one witness says while the opposite is to be believed on account of 
the testimony of the other. Thus the testimony of the two witnesses can be 
combined into the equivalent testimony of one witness as 


(Mm+Mn+Nm)a+ Nnu+(Np+Pn+Ppje. 
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Various extensions of this result to different situations follow, during the 
course of which Lambert confutes Bernoulli’s rule for the combination of 
testimony. 

Modern research shows that Lambert’s rule is a special case of Demp- 
ster’s rule for the combination of belief functions: see Shafer [1976a] and 
[1976b, pp. 374-376]. 


5.3 Bayes and Price 


5.3.1 Bayes’s paper on divergent series 


Strictly speaking, of course, discussion of this paper has no place here. 
Nevertheless, in view of Bayes’s extremely limited output of writings on 
mathematical topics, and because of our interest in Bayes in general, I pro- 
pose to give it some attention, albeit brief?°. 

This paper, published in the Philosophical Transactions for 1763, pages 
269-271, is referred to by Price in a footnote to page 401 of the Essay, in 
connexion with the evaluation of factorials needed for Rule 2. Price was 
possibly also responsible for the submission of this paper!®, since, although 
it bears only the heading “A Letter from the late Reverend Mr. Thomas 
Bayes, F.R.S. to John Canton, M.A. and F.R.S.”, it was read on the 24th 
November 1763. 

Here Bayes considers the expansion of the series for log z! already consid- 
ered by “some eminent mathematicians” [p. 269]. In his introductory note 
preceding the 1940 reprinting of this paper by Molina, W. Edwards Deming 
suggests that Bayes had de Moivre and Stirling in mind when using this 
phrase, and adduces in support of this suggestion the following remarks*’: 


(1) Bayes’s use of c for 27 is commonly found in the writings of both de 
Moivre and Stirling; 


(11) while many series were available to Bayes as illustrations, the one he 
in fact used is that which de Moivre and Stirling studied extensively; 


(111) Price, Bayes’s intimate, refers, on p. 401 of the Essay, to “Mr. De 
Moivre, Mr. Simpson and other eminent mathematicians” . 


But all this is mere conjecture: let us return to the paper in question. 
Bayes states that it has been asserted that }~7_, logk is equal to 


sloge+(z+s)logz—S, 
where c denotes the circumference of a circle whose radius is unity and 
where 


1 1 1 1 1 


as Taz * 36023 126025 * 168027 118829 


+ 
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However, examination of the manner in which the first few coefficients in 
S are formed persuaded Bayes that 


at length the subsequent terms of this series are greater than 
the preceding ones, and increase in infinitum, and therefore the 
whole series can have no ultimate value whatsoever. [p. 270] 


Nowadays, of course, the series would be more correctly (and suggestively!) 
written!® as 


log z! ~ log V2m + (2+ $)logz—S. 


(The determination of the constant /27 is due to stirling — see Archibald 
[1926, p. 675], and note the original Latin on p. [2].) Bayes was apparently 
among the first (if not the first) to appreciate the asymptotic character of 
the series'?, and for this alone he surely deserves some acknowledgement. 


5.38.2 The supplement to the Essay 


On the 26th of November 1764 Price submitted to John Canton a sup- 
plement to the Essay. The first part of this paper was apparently due to 
Bayes, for shortly before Section 13 we find the words “Thus far have I 
transcribed Mr. Bayes”. Indeed, the fact that Bayes’s proof given here is 
only for Rule 2 perhaps lends weight to my earlier assertion (see §3.4) that 
the third rule was due to Price. 

This supplement is devoted to proofs and some slight elaboration of the 
Rules of the Essay?°: we shall content ourselves here with some fairly gen- 
eral remarks (a comparatively detailed discussion may be found in Sheynin 
[1969]). 

After having mentioned the refinements he had proposed to the bounds 
given by Bayes, Price went on in his letter to Canton to say?! 


Perhaps, there is no reason about being very anxious about 
proceeding to further improvements. It would, however, be very 
agreeable to me to see a yet easier and nearer approximation 
to the value of the two series’s in the first rule: but this I must 
leave abler persons to seek, chusing now entirely to drop this 
subject. [p. 296] 


Since Price did not die until 1791, it is to be hoped that his desire expressed 
in this quotation was realized. 

Part of the proof given here may be found, in manuscript, in the notebook 
of Bayes referred to earlier: we shall discuss the relevant passage later on. 

It is perhaps worth noting that the Rule as given in the Supplement 
covers more cases than that of the Essay??. In the latter it is stated that, 
if I guess that the probability of an event’s happening in a single trial les 
between p/n +z and p/n — z, the chance of my being right is greater than 


20 /(1+ 2Ea?b! + 2a? b4/n) (1) 
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and less than 
2% /(1 — 2Ea?b! — 2K a?b4/n) , (2) 


while if p = q the chance is exactly 2. (Here a,b, F and © are as defined 
in Rule 2 of the Essay — see §3.4.) 

In the Supplement, however, the following six cases are considered: 
Case 1. If g > p and I judge that the probability of the event’s happening 
in a single trial lies between p/n and p/n + z, the chance that I am correct 
is greater than and less than 


1+ 2H aPb? + 2E ab? /n 


OX T_OBarot —2EaPb [a ) 


Case 2. If q > p, and the limits in the previous case are replaced by p/n —z 
and p/n, my chance of being right is less than © and greater than 


1 — 2EaPb! — 2FaPb4/n 


ye 4 
* T+ QE arb? + 2EaPbt/n 3) 


Case 3. If p > q, and the limits are p/n and p/n+z, the chance of a correct 
guess is less than & and greater than (4). 
Case 4. If p > q and the limits are p/n —z and p/n, the chance of my being 
correct is greater than and less than (38). 
Case 5. If p = g, my chance is © exactly (this must refer to either the 
interval (p/n, p/n + z) or (p/n — z,p)). 
Case 6. Whether p > q or q > p, and I judge that the probability les 
between p/n — z and p/n + z, the chance of my being correct is greater 
than (1) and less than (2). If p = q the chance is 2h exactly. 
Notice that it is this last case that is given as Rule 2 in the Essay. 

It is now that Bayes’s contribution ceases and Price proceeds to his own 
improvements of the bounds, motivating his investigation in the following 
words: 


It appears, from the Appendix to the Essay, that the rule here 
demonstrated, though of great use, does not give the required 
chance within limits sufficiently narrow. It is therefore necessary 
to look out for a contraction of these limits... [p. 310] 


Articles 13-28 are devoted to this investigation, Price returning in the last 
of these articles to one of the examples mentioned in the Appendix to the 
Essay (the fifth case, p. 415). 

In Article 24 Price concludes that under the conditions of Case 6, the 
chance of my being right is greater than (1) and less than 2%. He next passes 
on to “determine within still narrower limits whereabouts the required 
chance must lie” [art. 24], his conclusion being summarized in Article 28 
as follows [notation slightly altered]: 
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If either p or q is greater than 1, the true chance that the prob- 
ability of an unknown event which has happened p times and 
failed q in (p+ q) or n trials, should lie somewhere between 
p/n+z and p/n — z is less than 2%, and greater than 


(1 — 2a? bi — 2B abt /n) 


yf 
oF 1+ Harb? + BaPbi/n 


If either p or q is greater than 10, this chance is less than 20, 
and greater than 


E(1 — 2LaPb! — 2Ea?b4/n) 


y+ 
ss 1+ Harb? /2+ EaPbt/2n 


[p. 323] 


To show the improvement effected by his limits, Price returns to an 
example considered in his Appendix to the Essay: an event, concerning 
which nothing is known, has happened 100 times and failed 1000 times 
in 1100 trials. The chance that the probability of this event lies between | 
+ cai and + — a: as computed by Bayes’s Rule 2 (art. 12) lies between 
0.6512 (odds of 186 to 100) and 0.7700 (odds of 334 to 100). (The numbers 
given originally in the appendix were incorrect, since m” was there set equal 
to n3/pq instead of n3/2pq.) Using the fact that the improved bounds?? are 
2 and 
“(1 — Ba? b? ~ 2B ab? /n) 
1+ HaPbt/10 + EaPbs/10n ’ 

Price finds the limits 0.6748 (odds of 207 to 100) and 0.7057 (odds of 239 
to 100). 
Price’s investigations led him to conclude that 


d+ 


In all cases when z is small, and also whenever the disparity 
between p and q is not great 2) 1s almost exactly the true 
chance required. And I have reason to think, that even in all 
other cases, 2D gives the true chance nearer than within the 
limits now determined. [pp. 323-325] 


Before leaving the Supplement one might note Price’s footnote to the 
corollary in Article 20. Here he points out that it follows from Article 20 
that, in the case in which neither p nor q is very small (or even not less 
than 10), the probability z of the event satisfies the following: 


: 1 1 1 
(i) Pre re, Fe 


(ii) Pr(B—-y<a< Pty] a § 


nm n 


(ili) Prlp—Va7<a< b+ Vay] ~ 2 
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where y = \/pq/(n3 — n*) is “the point of contrary flexure” fart. 26]*+. A 


numerical example, with p = 1000 and q = 100, follows. 
In the chapter on Laplace we shall show that (in the notation of the 
present section)?° 


p TA /2pq p  7T/2pq 2 a = 
Di he cee ure pd dt . 
|? n/n ae oot VT Jo ° (5) 


With 7 = 3,/n/(n—1), the left-hand side of (5) becomes the left-hand 
side of (i), and hence the latter is approximately equal to 


| WO 6 dt = 20( Jana) - 1, (6) 


where ®(-) is the cumulative distribution function of a random variable 
having the standard Normal distribution. If n is large, then n/(n — 1) = 1, 
and (6) becomes 


bole 


26(1//2) — 1 = 0.5588, 


which accords reasonably well with (i). 

We have already gleaned from our discussion of the Essay that Price 
probably believed Bayes’s results to be applicable in a causal setting. This 
is given further support by the following passage from the covering letter: 


The solution of the problem enquired after in the papers I have 
sent you has, I think, been hitherto a desideratum in philoso- 
phy of some consequence. ‘To this we are now in a great measure 
helped by the abilities and skill of our late worthy friend; and 
thus are furnished with a necessary guide in determining the na- 
ture and proportions of unknown causes from their effects, and 
an effectual guard against one great danger to which philoso- 
phers are subject; I mean, the danger of founding conclusions on 
an insufficient induction, and of receiving just conclusions with 
more assurance than the number of experiments will warrant. 


[p. 297] 


As we have seen, however, there is little (if anything) in Bayes’s Essay to 
warrant such an extension of the results. As it is, at all events, no causal 
application is called for in this paper, and the matter is accordingly of no 
importance in the present context. 


0.3.38  Bayes’s Notebook 


In his comments on a paper by Perks [1947], M.E. Ogborn casually men- 
tioned that 


in his own office there was a book which for some time he had 
not been able to place, but when visiting the Royal Society in 
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connexion with another matter he had realized that the hand- 
writing in the book appeared to be identical with other speci- 
mens of Bayes’s handwriting. He thought it was, in fact, one of 
Bayes’s notebooks. [Perks 1947, p. 318] 


No further attention was apparently paid to this remark until Holland 
[1962] commented on the relic in a biographical note on Bayes, choosing to 
cite from the contents of the notebook 


a method of “finding the time and place” of the conjunction of 
two planets, some notes on weights and measures, on a method 
of differentiation and a note on logarithms. [p. 457] 


Holland also mentioned the note on an electrifying machine and drew at- 
tention to the shorthand (a modification by Elisha Coles of one of Thomas 
Shelton’s systems) used by Bayes*®. (In addition to the shorthand, Bayes 
used English, French and Latin.) 

Only one passage in the notebook pertains to probability: it is concerned 
with a proof of one of the Rules in Bayes’s Essay, and since it is probably 
not readily available I present, from here to the end of this section, a free 
translation of the original Latin text (for further details see Dale [1986]). 
Some of the formulae are given in a more modern notation, and certain 
obvious lapsus calami have been corrected. Bayes’s first paragraph is un- 
labelled. 


Firstly, let S/V = x. Then « = (S/V)(S/S—V/V). Thus if S/S > V/V 
and S and V are both increasing (and of the same sign), x > 0, and so x 
is increasing. Similarly, if y= V/S and S and V are both decreasing (and 
of opposite sign) then V/S' is increasing. 

Art. 2. Let 
A = (l—nz/p)?(1+ nz/q)! 


B 
where n = p+q. Then 


r Adz =n" fn +n(Rore] | 


—q/n 


(1+ nz/p)?(1 — nz/q)! 


The integral of B from z = —p/n to q/n reduces to the same expression. 


Art. 3. With A and B as given above, and D and A defined by 
D=(1- n?z? /q?)nal2P A=(1- n2z? /p2)nP/29 | 
we find that 
B/B : D/D 3 (1—n?z?/q?):(1—nz/q)(1+nz/p) 


(1+ nz/q): (+ nz/p). 
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Since g > p and D/D is negative, it follows that B/B > D/D. (Examina- 
tion of the subsequent Articles shows that attention is restricted to z > 0.) 
Hence, since B and D are both decreasing functions of z, we find from the 
first Article that B/D is increasing; and since B = D = 1 when z = 0, we 
may conclude that B > D — and similarly that A < A. Moreover, since 


A? = (1 —n?z2/p?)"P/9 and AB = (1 —1n2z?/p*)P(1 — n?z7/q?)! , 
it follows that 
A? : AB: (1- n2z? /p?)P/9 : (1 — n?27/q7)!. 
Thus : ; 
A*? : (AB)! :: (1- n® z? /p*)P > (1 - n? z* /q7)! 
and hence A? < AB and 2A < A+B. 
Art. 4. Let A be an unknown event with prior probability x, and let Aj , 


denote the event that A has happened exactly p times in n = p+ q trials. 
By Proposition 10 of the Essay it follows that, for z > 0, 


Py = Pr [p/n —z<2<p/n|Az,| 
Pin 7 ‘in 
= | ("er 2) iz | | (7) a?(1— 2) dz 


[| 


(n + (7) [om — u)P(q/n ~. u)! du 


[| 


n+ 1)(")(pPat in) [= nufprL + mu/a)t du. (0 
Having noticed in Article 3 that 
(1—nu/p)?(1 + nu/g)? < (1 — nu? /p?)rP/4 
we see that 
Pis(nt1)(") pat /n®) [a= n2u2/ptyrrle de 


2°. Under the same hypotheses it follows that 


Py = Pr [p/n <2<p/n+z| A? |] 
= (nt 1)() atm) [= nu/a)*( + n/p) de (8) 


n 
P 


IV 


(n+ 1)(2)prat fn) [d= mPa /p?yral?? da 
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Ps = Pr [p/n—z<a<p/nt+z|Ar,] 


= (n+ (P)crat/nr) [Ee ee 


> An+ HF ) (pat /n” ) ff (1 — n?u?/p?)"?/*4 du (since A+ B > 2A). 


In subsequent Articles certain approximations to these integrals are ob- 
tained. 
Art. 5. From the expression 


nt = Var n™t2e7% 8/120 5 Ui < I 


a qfnn — |_™  ,0(pq—np—ngq)/12npq 
(yea ps Imp. , 


Since (pq — n*) < 0, we may conclude that 


he <(; ) erat /n 7 bes 


where In N = (pq — n*)/12npq. 
(Bayes does not give details of his derivation of this last expression: I have 
tried to reconstruct his argument.) 


one finds that 


Art. 6. 
i (ts n2u? /p?)nP/24 Ae BABU [T((np/2q) + 1]? . 
_ nT ((np/q) +2) 


On our using the approximation 
U(z +1) ~ Van z2tte~? , 


we obtain 


p/n 2 
: (l—n 24,2 /p?)rP/ 24 Pere (1+ q/np)- (np/qt+3) 
—p/n np+q 


a p 27 pq 
2(np + q) 


Similarly 
anes 


q/n yg 1/2 q g 
1—n?u e u>———- 
[i : 2(nq + p) 


5.3 Bayes and Price 81 
Art. 7. From Articles 6 and 4.3 it follows that 


Pr[0< 2 < 2p/n| Ap ,] 


> (n+1) @ (p?q?/n”) fo _ n?u2 /p?)rP/24 du 


> N(n+1)p/(np +4) - 


If p and q are both large, and q > p, then the right-hand side of this last ex- 
pression is approximately 1. Furthermore, under such conditions on p and q, 


Pr[p/n—z<z<p/n+z|Aj,| 


p/n 
~ 2(n+ I Vn]mpa) [= ntu2/p?yr ed 
—p/n 
without appreciable error. 
In the last two Articles of his work Bayes turns his attention to the 


evaluation of the integrals in (7) and (8). 
Art. 8. Ife:r::p:q and (z#+1r)” is expanded, then 


(ieee ? oe = 1: (g/(p+1))(p/q) 


(7) aPre (eer cae (ae is lay &e. 


Similarly 


(7) ares : (per tet | Ls (p/(q + 1))(a/p) 


(") aP } OO ace sd eS (ale? &e. 


Thus 


4 (2)+ (2) 


aq—1)(q—-2)  (p\” 
* @+DE+V—+3) 6 = | / 1 
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where the denominator Q is given by 


as (3) + atin ($) 


p(p-1)(p-2) fa), 
"G+ D@+ Da+3) 6 i | 


Now if g > p the k-th term of the numerator on the right-hand side of this 
last expression is greater than the corresponding term of the denominator. 
For small p the series in the denominator terminates first, wherefore the 
series in the numerator is greater than that in the denominator. If in fact 1 
is subtracted from the series in the numerator, the series in the denominator 
will be larger than that in the new numerator. 

Art. 9. Attention is next turned to the evaluation of the integral 


a 
III 


p/n e 
Le se () (pP q?/n")(1 — nz/p)P(1 + nz/q)! dz 


f [tn +1) () (p/n — z)?(q/n + 2) dz 


—q/n 


[ nn (™aretdet [n+ 1()aPet de 


—q/n 


I, +12, 


say, where « = p/n—z,r=q/n+z. 
To evaluate J,, consider the series 


a) 


Varttty ( 1 ree toe Rit ye | 


Since r= z = —2, 
V =(n41)r"* + i [nr™—I pe + r™ e] + ee [(n — 1)r"—-?r x? 
+r 1dr] +---+F [(q + lrir ac? + r9t pzP-1e] 
=(n+1)r™r+ Go [nr™ 1p Lo rrp] + ee [(n —1)r"~27 x? 
—r™-l2er) +---+F [qt lrir x? — rit! paPtr| 


=(q+1)FrizPr . 
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Noting that F is the coefficient of r?t+!z? in the expansion of (x + r)"t?, 
we find that (¢q+1)F = (n+ CG); and hence V can be written in the form 


a) 


Vie rtt + ( fo jrtat---+[(nt+1)/+)] (eae , 


a series that reduces, when z = 0, to 


_ (n+l) q ee c Pa ee ' 


pS Sete ee ee ela 
n q+1\P/ n™ (q+2)p (q+2)(q+3)p? 


Thus 
1g) 


I 


[ (n+ 1) (errs dz 


—q/n 


[orn Q)a-nera (='R) 


Se ee 1)(%) Baynla + lp+1), 


B,(b,c) denoting the incomplete beta-function?’. Similarly 


ie (n + 1) () ert ds 


(n+ 1)(") Bono + La 1). 


Ty 


5.3.4 Price’s Four Dissertations 


In 1767 Richard Price published?® a volume entitled Four Dissertations, 
these being the following: 


1. On Providence. 
2. On Prayer. 


3. On the Reasons for expecting that virtuous Men shall meet 
after Death in a State of Happiness. 


4. On the Importance of Christianity, the Nature of Historical 
Evidence, and Miracles. 


Only the fourth of these essays contains anything pertinent to our topic: 
in the second section, entitled “The Nature and Grounds of the Regard due 
to Experience and to the Evidence of Testimony, stated and compared” , 
we find some discussion of probability. Although no direct use of Bayes’s 
Theorem is made, Price does quote examples illustrating the results he had 
given in the Appendix to Bayes’s Essay?9. 
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But before considering these examples, it might be of interest to note 
Price’s illustration of the influence of knowledge on future observation. 
After a long quotation from Hume’s Essay on Miracles*° Price turns to a 
consideration of the assurance, given by experience, of the laws of nature. 
“This assurance”, he says, 


is nothing but the conviction we have, that future events will 
be agreeable to what we have hitherto found to be the course of 
nature, or the expectation arising in us, upon having observed 
that an event has happened in former experiments, that it will 
happen again in future experiments. [pp. 389-390] 


This is then illustrated by the following example: 


if I was to draw aslip of paper out of a wheel, where I knew there 
were more white than black papers, I should intuitively see, that 
there was a probability of drawing a white paper, and therefore 
should expect this; and he who should make a mystery of such 
an expectation, or apprehend any difficulty in accounting for 
it, would not deserve to be seriously argued with. — In like 
manner; if, out of a wheel, the particular contents of which | 
am ignorant of, I should draw a white paper a hundred times 
together, I should see that it was probable, that it had in it more 
white papers than black, and therefore should expect to draw 
a white paper the next trial. There is no more difficulty in this 
case than in the former; and it 1s equally absurd in both cases 
to ascribe the expectation, not to knowledge, but to instinct. 


[pp. 390-391] 


Similar examples, concerned with the tossing of a die and with the hap- 
pening of an event in every trial a million times, are also cited to show that 
an observed frequency should be used as a reasonable predictor for future 
occurrences. 

In a long footnote (stretching over four pages) Price proceeds to the ex- 
amples mentioned above. Although some of these are somewhat similar to 
those given in his Appendix to the Essay, I choose to give them all here in 
detail as they are seldom cited. 


In an essay published in vol. 53d of the Philosophical Transac- 
tions, what is said here and in the last note, is proved by math- 
ematical demonstration, and a method shewn of determining 
the exact probability of all conclusions founded on induction. 
— This is plainly a curious and important problem, and it has 
so near a relation to the subject of this dissertation, that it will 
be proper just to mention the results of the solution of it in a 
few particular cases. 


5.3 Bayes and Price 


Suppose, lst, all we know of an event to be, that it has happened 
ten times without failing, and that it is inquired, what reason 
we shall have for thinking ourselves right, 1f we judge, that the 
probability of its happening in a single trial, lies somewhere be- 
tween sixteen to one and two to one. — The answer is, that the 
chance for being right, would be .5013, or very nearly an equal 
chance, — Take next, the particular case mentioned above, and 
suppose, that a solid or dye of whose number of sides and con- 
stitution we know nothing, except from experiments made in 
throwing it, has turned constantly the same face in a million of 
trials. — In these circumstances, it would be improbable, that 
it had fess than 1,400,000 more of these sides or faces than of 
all others; and it would be also zmprobable, that it had above 
1,600,000 more. The chance for the latter is .4647, and for the 
former .4895. There would, therefore, be no reason for think- 
ing, that it would never turn any other side. On the contrary, it 
would be likely that this would happen in 1,600,000 trials. — In 
like manner, with respect to any event in nature, suppose the 
flowing of the tide, if it has flowed at the end of a certain inter- 
val a million of times, there would be the probability expressed 
by .5105, that the odds for its flowing again at the usual period 
was greater than 1,400,000 to 1, and the probability expressed 
by .5352, that the odds was less than 1,600,000 to one. 

Such are the conclusions which uniform experience warrants. 
—— What follows is a specimen of the expectations, which it is 
reasonable to entertain in the case of interrupted or variable 
experience. — If we know no more of an event than that it has 
happened ten times in eleven trials, and failed once, and we 
should conclude from hence, that the probability of its happen- 
ing in a single trial lies between the odds of nine to one and 
eleven to one, there would be twelve to one against being right. 
— If it has happened a hundred times, and failed ten times, 
there would also be the odds of near three to one against being 
right in such a conclusion. — If it has happened a thousand 
times and failed a hundred, there would be an odds for be- 
ing right of a ttle more than two to one. And, supposing the 
same ratio preserved of the number of happenings to the num- 
ber of failures, and the same guess made, this odds will go on 
increasing for ever, as the number of trials is increased. — He 
who would see this explained and proved at large may consult 
the essay in the Philosophical Transactions, to which I have re- 
ferred; and also the supplement to it in the 54th volume. — The 
specimen now given is enough to shew how very inaccurately 
we are apt to speak and judge on this subject, previously to 
calculation. ... It also demonstrates, that the order of events 
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in nature is derived from permanent causes established by an 
intelligent Being in the constitution of nature, and not from 
any of the powers of chance. And it further proves, that so far 
is it from being true, that the understanding is not the faculty 
which teaches us to rely on experience, that it is capable of 
determining, in all cases, what conclusions ought to be drawn 
from it, and what precise degree of confidence should be placed 


in it. [pp. 395-398] 


In a further footnote [pp. 440-452], Price provides two definitions and 
two propositions concerned with probability. These are as follows: 


Definition lst. An event is probable, when the odds for its hap- 
pening are greater than those against its happening; zmprobable, 
when the odds against are greater than those for; and neither 
probable nor improbable when these odds are equal. — ‘This is 
the proper sense of these words; but the writers on the doctrine 
of chances use the word probable in a more general sense. 
Definition 2nd. Two events are independent, when the happen- 
ing of one of them has no influence on the other. 

Proposition lst. The improbabilities of zndependent events are 
the same whether they are considered jointly or separately. ‘That 
is; the improbability of an event remains the same, whether any 
other event which has no influence upon it happens at the same 
time with it, or not. This is self-evident*!. 

Proposition 2nd. The improbability that two independent events, 
each of them not improbable, should both happen, cannot be 
greater than the odds of three to one; this being the odds that 
two equal chances shall not both happen; and an equal chance 
being the lowest event of which it can be said that it is not 
improbable. 


On reading these definitions and propositions one is struck by the differ- 
ence between the carefully phrased text by Bayes and the looser and more 
coloquial statements given by Price. 

Price’s first definition is unexceptionable. If we denote by Op and Oy 
respectively the odds in favour of and against some event £, then from 


Or 
Pri f£| = ———-~ 
[2] (Or + Oa) 
it follows immediately that 
Or > Og => Pr[E] > eo 


or F is probable. Similar results obtain for events that are zmprobable or 
that are neither probable nor improbable. 
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The definition of independence adopted here by Price is reminiscent of 
that given earlier by de Moivre, in the third edition of whose Doctrine of 
Chances we read 


Two Events are independent, when they have no connexion one 
with the other, and that the happening of one neither forwards 
nor obstructs the happening of the other. [1756, p. 6] 


Notice that this definition, like Price’s, makes no mention of probability. 

In his first proposition Price writes of “the improbability of an event”. 
Now, while he has carefully defined, as we have already noted, probable and 
improbable events, the improbability of an event is not defined. Whereas 
an improbable event FE is one for which Pr[E] < 1/2, the improbability 
of # can clearly be any number in [0,1] (see Price’s second proposition 
and his earlier remarks quoted here). It seems, then, that, adopting a less 
pessimistic term than that advocated by Price, one might well consider the 
improbability of an event as being its probability. With this interpretation 
Price’s first proposition is seen to be in line with a résumé following de 
Moivre’s earlier definition, in which we read 


the Probability of the happening of several Events independent, 
is the product of all the particular Probabilities whereby each 
particular Event may be produced [1756, p. 21], 


and this phrasing, incorporating “probability” into “independence”, is in 
keeping with the definition of independence given by Bayes (see §4.3). 

Passing on to Proposition 2nd., we note that Price, unlike Bayes, does not 
equate chance with probability — the latter in fact wrote “By chance I mean 
the same as probability” [Bayes, 1763a, p. 376]. Price’s writing “an equal 
chance being the lowest event” received later support from Emerson, who 
wrote “Chance is an event” [1776, p. 2] (see §5.7). If, as in our examination 
of Price’s first proposition, we take the improbability of an event to be 
conterminous with its probability, then the arguments presented here may 
be given symbolically as follows: 


Pr[E]>4 A Pr{F]>4 => Pr[e]Pr[F] >} 
=> 


=> 1-Pr[EF]< $ 
=> PrlEF) <j, 


with equality if and only if Pr|#] = Pr[F] = 1/2. 
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9.4 John Michell (1724-1793) 


Three years after the posthumous publication of Bayes’s Essay, Michell®? 
(described in Manuscript xxxiii, 156, in the William Cole collection as “a 
little short man, of a black complexion and fat”) published a paper®® en- 
titled An Inquiry into the probable Parallaz, and Magnitude of the fixed 
Stars, from the Quantity of Light which they afford us, and the particular 
Circumstances of their Situation. Although Michell’s argument is markedly 
similar to that used by Arbuthnott?* in 1710 in his essay*® in which an ar- 
gument for divine providence is put forward on the basis of an observed 
constant regularity in the birth rates of the two sexes, and to that of Daniel 
Bernoulli in his prize-winning essay of 1734 on the attribution to chance of 
the inclinations to the ecliptic of the planetary orbits, inasmuch as it can 
perhaps be interpreted as a significance test, many of those who examined 
Michell’s memoir in the nineteenth century found in it an application of 
inverse probability. Thus it is expedient to pay some attention to the mem- 
oir here, the particularly relevant section being found on pages 243-250. 
The assertion Michell proposes to prove is the following: 


that, from the apparent situation of the stars in the heavens, 
there is the highest probability, that, either by the original act 
of the Creator, or in consequence of some general law (such 
perhaps as gravity) they are collected together in great numbers 
in some parts of space, whilest in others there are either few or 
none. [p. 243] 


The method to be used in order to prove this assertion 


is of that kind, which infers either design, or some general law, 
from a general analogy, and the greatness of the odds against 
things having been in the present situation, if it was not owing 
to some such cause. [p. 243] 


The first thing to be examined is “what it is probable would have been 
the least apparent distance of any two or more stars, any where in the 
whole heavens”, it being always supposed that “they had been scattered 
by mere chance, as it might happen” [p. 243]. Consider firstly two stars A 
and B: the probability that B will be within a distance of one degree of A 
is the ratio of the area of a circle of one degree angular radius to the area 


of the sphere (of radius R) of fixed stars, i.e. working in radians®°, 


r(27R/360)?/40R? , 
which reduces to 0.000076154 or 1/13,131. Thus the probability that B is 
not found within one degree of A is 13,130/13,131. Furthermore, 


because there is the same chance for any one star to be within 
the distance of one degree from any given star, as for every 
other [p. 244], 
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the probability that none of n stars will he within one degree of A is 
(13,130/13,131)", while the complement of this quantity to 1 is the prob- 
ability that at least one of the n stars is within the given distance of A. 

Wishing now to abandon the significance given to the star A, Michell 
states that 


because the same event is equally likely to happen to any one 
star as to any other, and therefore any one of the whole number 
of stars n might as well have been taken for the given star as 
any other [p. 244], 


it follows that the probability that no two of the n stars are within one 
degree of each other is [(13, 130)” /(13, 131)"]” : we shall comment on the 
correctness of this statement later. 

It follows similarly that to find the probability that, of n stars, no two 
stars should be one within the distance x and the other within the distance 
z of a given star, one must firstly consider the fractions 


N\2 277 N\2 427” 
a = | (6875.51)? = 2 | mip bie z | 
(6875.5/)2 (6875.5')2 


(the denominators being the square of 2 radians, in minutes) which give 
the probabilities that no star is within the distances x and z of the given 
star. Since 


the probability that two events shall both happen, is the prod- 
uct of the respective probabilities of those two events multiplied 
together [p. 245], 


it follows that the probability that one star is within a distance z of the 
given star, and that another is within a distance z of that same star 
is (1 — a)(1 — #). And finally, the probability that of n stars, no two 
exist that are within respective distances x and z of the same star, is 
l=C-o)0=—". 

Two examples follow. In the first of these Michell finds the probability?” 


that no two stars, in the whole heavens, should have been within 
so small a distance from each other, as the two stars § Capri- 
corni, to which I shall suppose about 230 stars only to be equal 
in brightness. [p. 246] 


Under the supposition that the distance between these stars is something 
less than 34 , the required probability is found to be 


[1 — r(2rR32 /360 x 60)? / 42] *"™ 


3 


or 80/81. 
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In the second example Michell considers the six brightest stars of the 
Pleiades, the stars Taygeta, Electra, Merope, Alcyone and Atlas being re- 
spectively at distances 11, 194, 245, 27 and 49 minutes from Maia. Sup- 
posing the number of stars “which are equal in splendor to the faintest of 
these” [p. 246] to be 1,500, Michell finds the odds to be almost®® 500,000 
to | 


that no six stars, ... scattered at random, in the whole heavens, 
should be within so small a distance from each other as the 
Pleiades are. [p. 246] 


Michell states further that the same argument will be found to be “still 
infinitely more conclusive” [p. 249] if extended to smaller stars and those 
in clusters. 


We may from hence, therefore, with the highest probability 
conclude (the odds against the contrary opinion being many 
million millions to one) that the stars are really collected to- 
gether in clusters in some places, where they form a kind of 
systems, whilst in others there are either few or none of them, 
to whatever cause this may be owing, whether to their mutual 
gravitation, or to some other law or appointment of the Cre- 
ator. And the natural conclusion from hence is, that it is highly 
probable in particular, and next to a certainty in general, that 
such double stars, &c. as appear to consist of two or more stars 
placed very near together, do really consist of stars placed near 
together, and under the influence of some general law, whenever 
the probability is very great, that there would not have been 
any such stars so near together, if all those, that are not less 
bright than themselves, had been scattered at random through 
the whole heavens. [pp. 249-250] 


Thus far the relevant work. 

Had Michell contented himself with stopping before the last quotation, 
his work would in all probability have been seen as an early significance 
test, and we should have been spared much of the ensuing controversy. 
But the passage quoted above suggests strongly that Michell thought the 
strength of his argument to be measurable, and his work came to be seen 
as an application of inverse probability. 

In 1827 Struve proposed a completely different argument, which ran as 
follows. The number of possible binary combinations of n stars being (5), 
the chance that any pair falls within a small circle of area s is (3) s/S, where 
S is a given area of the celestial sphere. As a special case Struve considered 
the surface from —15° declination to the north pole (so S = 47sin” 524°), 
with n = 10229 and x = 4”, where z is the radius of the small circle. 
He evaluated the above expression as 0.007814. Struve also considered the 
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cases in which x = 8,16 and 32 seconds, and discussed similar results for 
the triple star problem. 

For 6 Capricorni Michell takes n = 230 and g = 32 In this case®? 
s/S = 1/4254517, which Michell in fact takes as 1/4254603. Application of 
Struve’s formula to Michell’s figures yields 


1 230 x 229 . 1 
2 4254603 


(rather than Michell’s 80/81), a figure that Lupton [1888a, p. 273] inter- 
prets as “the probability that no two such stars fall within the given area.” 

An endorsement of Michell’s argument appeared in 1849 in J.F.W. Her- 
schel’s Outlines of Astronomy. Here the example of the Pleiades 1s re- 
hearsed, though Herschel finds Michell’s estimate of 1,500 stars to be “con- 
siderably too small” (1873, art. 833]. Citing also Struve’s Catalogus novus 
stellarum duplicium et multiplicium of 1827, Herschel finds*® that “The 
conclusion of a physical connexion of some kind or other is therefore un- 
avoidable.” 

Comment on Herschel’s work followed swiftly. In the same year in a short 
letter to the editors of the Philosophical Magazine and Journal of Science, 
J.D. Forbes*! wrote 


= 160.6/161.6 


Now I confess my inability to attach any idea to what would 
be the distribution of stars or of anything else, if “fortuitously 
scattered,” much more must I regard with doubt and hesita- 
tion an attempt to assign a numerical value to the antecedent 
probability of any given arrangement or grouping whatever. An 
equable spacing of the stars over the sky would seem to me to 
be far more inconsistent with a total absence of Law or Princi- 
ple, than the existence of spaces of comparative condensation, 
including binary or more numerous groups, as well as of regions 
of great paucity of stars. [pp. 132-133] 


In his 1850 review of Quetelet’s Lettres a S.A.R. le Duc régnant de Saze- 
Cobourg et Gotha sur la Théorie des Probabilités appliquée aux Sciences 
Morales et Politiques Herschel, mentioning neither Michell nor Forbes, in 
an attempt to clear up 


a singular misconception of the true incidence of the argument 
from probability which has prevailed in a quarter where we 
should least have expected to meet it [p. 36], 


indicated the inductive nature of the argument for a physical connexion 
between stars and its independence of any calculations. It seems, however, 
that Herschel’s argument was misaimed, and, reasonable though it was, it 
did not invalidate Forbes’s reasoning. As Gower [1982] has pointed out, 
the difference between the two revolved around the meaning of terms like 
“random scattering” . 
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Forbes could not let this go unremarked, and on the 6th of August 1850 
he read a paper on the matter before the Physical Section of the British 
Association, an expanded version being published in the Philosophical Mag- 
azine and Journal of Science in the same year. The aim of this paper is 
expressly stated in the sixth article as follows: 


the argument which I have to state is not meant to controvert 
the truth of the general result at which Mitchell [sic] and Struve 
arrive, namely, that the proximity of many stars to one spot, or 
the occurrence of many close binary stars distributed over the 
heavens, raises a probability, or rather we would call it an znduc- 
tive argument, feeble perhaps, but still real, that such proximity 
may be actual, not merely apparent; but I deny that such prob- 
able argument is capable of being expressed numerically at all. 
[p. 403] 


Two main objections are raised to Michell’s work, these being summarized 
as follows: 


First, a confusion between the expectation of a given event in 
the mind of a person speculating about its occurrence, and an 
inherent improbability of an event happening in one particular 
way when there are many ways equally possible. Secondly, a too 
limited and arbitrary conception of the utterly vague premiss 
of stars being “scattered by mere chance, as it might happen;” 
— a statement void of any condition whatever. [pp. 421-422] 


In a Note to his paper Forbes takes exception to Michell’s expression 
[((13130)” /(13131)"]” for the probability that no two of n stars are within 
one degree of each other. With the assistance of “a mathematical friend, 
whose skill in these matters gives the utmost attainable assurance of his 
accuracy” [p. 425], Forbes proposed to consider n (the number of stars) 
dice, each having p sides. Then the chance of doublets when the dice are 
thrown simultaneously is equal to that of two stars “being found at a less 
distance than the radius of a small circle of the sphere which includes an 
area 1/p-th of the entire surface of the sphere” [p. 425]. The total number 
of arrangements, without repetition, being p(p —1)...(p —n+ 1), and the 
total number of outcomes being p”, the probability of an outcome without 
repetition is p(p—1)...(p -n+1)/p", and the chance that two or more 
dice show the same face is 


Lp pe 1) (pe mel) jp 


Using Michell’s figures for @ Capricorni, with p = 4254603 and n = 230, and 
approximating this last expression (using the Stirling-de Moivre theorem) 


by 
1 alae 
—— ( : ) | (9) 


P—-n 
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Forbes obtains, for the required probability, a value of 0.00617 (a mod- 
ern calculation yields 0.00625709), or approximately?” 1/160. This agrees 
closely with the value 0.00618977 = 1/161.6 obtainable from Struve’s for- 
mula. 

Sheynin [1984, p. 163] declares that the number of spherical surfaces 
(Forbes’s p) shoud be taken as 


3.2 


With this value equation (9) yields 0.00568856, or approximately 1/175.8. 

That Forbes’s work excited much discussion is shown by the letters 
reprinted in Shairp et al. [1873]. In a letter to Forbes dated 5th September 
1850, Kelland pointed out that the approximation of Forbes’s 


ppd) aap mb 1)/p 


60 \2 
13,131 /—] = 4,616, 367.189 = 1.085p. 


by 

Le(pmn/2)"/p" 
was unsatisfactory. Further letters, both in support of (from Terrot and 
Ellis) and against (from Airy) Forbes’s argument, are well-worth reading, 


though the controversy is perhaps most fairly expressed in a chapter written 
by Tait in Shairp et al. [1873], where we find the words 


Forbes ... hit upon a real blot in Mitchell’s argument, and 
rightly denounced its revival in Sir John Herschel’s justly cel- 
ebrated text-book. But they [the extracts quoted by Tait] also 
show that in dealing with the subject, he fell, at first at least, 
into mistakes quite as grave as those he was endeavouring to 
expose. [p. 485] 


In 1851 Boole entered the controversy. As he saw it, the statement of 
Michell’s problem in relation to @ Capricorni was as follows: 


1. Upon the hypothesis that a given number of stars have been 
distributed over the heavens according to a law or manner whose 
consequences we should be altogether unable to foretell, what is 
the probability that such a star as @ Capricorni would nowhere 
be found? 

2. Such a star as @ Capricorni having been found, what is the 
probability that the law or manner of distribution was not one 
whose consequences we should be altogether unable to foretell? 
[1851a, pp. 522-523] 


Boole went on to say that 


The first of the above questions certainly admits of a perfectly 
definite numerical answer [1851a, p. 523], 


an opinion with which Forbes, as we have already noted, violently disagreed. 
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After some discussion, Boole reformulated Michell’s problem as follows: 


There is a calculated probability p in favour of the truth in a 
particular instance of the proposition, If a condition A has pre- 
vailed, a consequence B has not occurred. Required the similar 
probability for the proposition, If a consequence B has occurred, 
the condition A has not prevailed. [1851la, pp. 523] 


Using “A” to denote the prevailing of the condition A, and “B” to denote 
the occurrence of the consequence B, Hailperin [1988, p. 167] re-writes this 
in the form 


Given Pr[A — —B] = p, find Pr[B > -A]. 


He notes too that Boole in fact treats these probabilities as though they 
were conditional probabilities rather than probabilities of conditionals, the 
relationship between these two being given (by Hailperin) as 


Pr{A|B] = ae | 
with 
Pr[A|B] = eS Pr[B ——> Al a 


when Pr[B] # 0. 

Denoting his two probabilities by p and P respectively, Boole finds p to 
be a determined number, and finds the fallacy to lie in the identification of 
p and P. (The same observation had earlier been made to Forbes by Bishop 


Terrot — see Shairp et al. [1873, p. 476].) Rewriting p and P as Pr[B | A] 
and Pr[A | B], one has, by the discrete form of Bayes’s Theorem, 


P = Pr[B | Aj Pr[A] /(Pr[B | A] Pr[A] + Pr[B | A] Pr[A]) . 


As aspecial case Boole considers p = 159/160 (which he considers to be the 
correct value, rather than 80/81), Pr [A] = 5 = Pr[B | A]. It then follows 
that P = 80/81, as Michell in fact found — but for p rather than P ! Boole 
also notes that Forbes had justly contended against the identification of p 
with P: Hailperin [1986, p. 357] suggests that this opinion attributes more 
credit to Forbes than is deserved. 

Boole’s own solution in fact runs in full as follows: 


Let us state Mr. Mitchell’s problem, as we may now do, in 
the following manner:— There is a calculated probability p in 
favour of the truth in a particular instance of the proposition, If 
a condition A has prevailed, a consequence B has not occurred. 
Required the similar probability for the proposition, If a conse- 
quence B has occurred, the condition A has not prevailed. 
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Now, the two propositions are logically connected. The one 
is the “negative conversion” of the other; and hence, if either 
is true universally, the other is so. It seems hence to have been 
inferred, that if there is a probability p in a special instance in 
favour of the former, there is the same probability p in favour of 
the latter. But this inference would be quite erroneous. It would 
be an error of the same kind as to assert that whatever proba- 
bility there is that a stone arbitrarily selected is a mineral, there 
is the same probability that a mineral arbitrarily selected is a 
stone. But that these probabilities are different will be evident 
from their fractional expressions, which are — 


1 Number of stones which are minerals 


Number of stones 


, Number of non-minerals which are not stones 
Number of non-minerals 

It is true that if either of these fractions rises to 1, the other 
does also; but otherwise, they will, in general, differ in value. 


[1851a, p. 523] 


Now it is clear that Boole is here confusing (a) the probability of a 
conditional with (b) a conditional probability (cf. §8.17 and Hailperin [1986, 
p. 358]): however the argument is saved by his correct treatment of (a) as 
(b). The probability P is given by 


= c(1 — a) 
a= c(l1—a)+a(1—p)’ 


where, Boole notes, 


c and a are arbitrary constants, whose interpretation is as fol- 
lows: viz. a is the probability of the fulfilment of the condition 
A, c is the probability that the event B would happen if the 
condition A were not satisfied. [1851a, p. 528] 


This result is in fact nothing more that a version of Bayes’s Rule, and is 
comparable to our expression for P derived above. 

Boole’s example shows clearly that the inference from P|—B|A] = p to 
P[AA|B] = p is invalid. Thus 


since Michell’s argument does employ conditional probabilities 
and not conditionals, Boole’s criticism of it is justified. His er- 
roneous belief that conditional propositions are involved is im- 
material to the point which he wishes to make. [Hailperin, 1988: 
p. 168] 


It is also worth noting that this paper, viz. Boole [1851a], plays an im- 
portant role in probability logic: indeed, Hailperin [1988] remarks that 
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[it] is noteworthy as having pointed out that ‘contraposing’ a 
conditional probability, 1.e. the equating of the probability of 
A, if B, with the probability of not-B, if not-A, is not valid. 
This is no small accomplishment since there was no clear un- 
derstanding — even by Boole — of the difference between a 
conditional probability, P(A|B), and the probability of a logi- 
cal conditional, P(B — A). [p. 167] 


Hailperin [1996] has noted that Boole did not make clear that Michell’s 
argument needs only the special (valid) case p = 1 of the inference 


Pr(-=B|A] = p therefore Pr[=A|B] = p. 


Boole reconsiders the problem, though without making any further spe- 
cific comments, in Chapter XX of An Investigation of The Laws of Thought 
of 1854. 

In the second part of his paper of 1851 Donkin presents a Bayesian ap- 
proach to Michell’s problem. He supposes that there are n visible stars of 
a certain class, for no two of which, were they within a certain angular 
distance of each other, could any conclusion be drawn from their appar- 
ent brightness as to whether they were merely optically double or actually 
formed a true binary system**. Suppose further that there are in fact m 
pairs of stars within these angular limits, the other n — 2m being single, 
and let p denote the a priori probability that a proposed system is binary 
(a system is defined to be either a single star or a binary system). Then, 
all systems but single and binary being excluded, 1 — p is the a priori 
probability of a single system. Donkin explains p as follows: 


Suppose a person to be perfectly acquainted with the mode in 
which the stars are produced; he would be able, setting aside 
difficulties of calculation, to assign the probability that a system 
about to be produced would turn out to be binary, and this would 
be the value of p. [pp. 462-463] 


It is assumed further that p is uniformly distributed over the unit interval. 

Now let P? denote the a priori probability that there are 2 binary sys- 
tems among n stars, and let’'Q* denote the a prior: probability that there 
are r optically double pairs among s single stars “whose configurations were 
accidental” [p. 463]. The aim is to determine the posterior probability of 
1. Donkin’s reasoning is somewhat loose, no clear distinction between joint 
and conditional probabilities being observed. In an attempt to put things 
on a firmer footing, let us denote by A? the event that there are m pairs 
among the n stars, by B” the event that there are z binary stars among m 
stars, and by C! the event that there are z optically double stars among 
m. Furthermore let us replace Donkin’s p by P. Then 
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Pr[An, | P=p Pr [BM On? | P =p] 


lI 
us 


= 3°Pr[BM | P =p]Pr (Cn? | BY & P=p! 


= ®(p), sa 
Therefore 


prlanl= [ @f(o)dr =e, 
where f(-) denotes the (uniform) density of P, and hence 
Pr[p< P <p+dp| Aj] = [®(p)/v] dp . 
Notice further that 
Pr{[B” & p< P<p+dp|At)]=Pr[p< P<p+dp| Aj] 
x Pr(Bm | AP & p< P< p+dp| 


B(p)d 
= 22) 4? » [Bm An, |p< P<p+dp]/Pr[A?, |p< P< p+dp| 


®(p) d | 
— Sy) p, [Br Cr_z |p <P <p+dp] /Pr[Ai, |p< P<p+dp] 


a) dp Epmgr- —21 


aes 5H) - 


Thus 
Pr[B™ | At] = — [ Prana d 
W Jo 


Denoting this last integral by y(z), Donkin points out that one may equiv- 
alently write 


Pr[BM" | AX] = y(i) ; ¥ (i 


Turning now to the evaluation of P” and Q%, Donkin notes firstly that, 
were k systems about to be produced, the probability that « would turn 
out to be binary would be p’q*-', where q = 1 — p. If n stars have been 
produced, and if no knowledge of the division into systems is available, the 


probability of 7 binary stars will be proportional to er ‘) p'q’—*, and hetieg 


m—i\ ; ,_; N—-t\ - _. 
pr = ( pig” 4 5-( pia” t 
41=0 


where v = n/2 or (n — 1)/2 according as n is even or odd. 
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Secondly he points out that 


The @ priort probability that two given stars, whose positions 
were accidental, would be within a given angular distance 6 of 
one another, is sin?(0/2) [p. 465], 


though 1t appears, from Michell’s paper, that this factor should be divided 
by four. Donkin then, like Forbes, considers s dice, each having t faces, 
where ¢ is the nearest integer to 1/sin*(@/2). The probability of getting 
doublets with a given pair of dice is then sin?(0/2), and it is then suggested 
that Q* be approximated by the probability of getting, in one trial with the 
s dice, r different doublets and s — 2r different numbers. From an earlier 
article of his paper (not discussed here), this probability is found to be 


t(#-1)...(¢-—(s—r)+1) s! 
{5 * (2) rl (s—2r)! 


Notice that this expression reduces to that given by Forbes [1850] when 
r = 0, i.e. when there are no doublets. 
Donkin now concludes by saying 


I should consider it a great waste of time and labour to at- 
tempt anything like a numerical result in the actual case. All 
that I have aimed at is to show that there is no real difficulty 
of principle in applying the theory of probabilities to this and 
similar questions, however impracticable it may be to obtain a 
complete numerical solution. [p. 466] 


In 1859 and 1860 Newcomb published a series of notes on probability in 
the Mathematical Monthly. In the fourth of these he discusses the Poisson 
distribution and applies it 


to the determination of the probability that, if the stars were 
scattered at random over the heavens, any small space selected 
at random would contain s stars. [1860a, p. 137| 


Taking N as the whole number of stars, h as the number of units of space 
and l as “the extent of space selected at random” [p. 137], Newcomb finds 
the desired probability P to be given by 
NéI 
=e Nh 
A specific numerical example, with which we shall not concern ourselves, 
then follows. 

The general conclusion at which Newcomb arrives is, however, that de- 
spite the vagueness and uncertainty present in the problem, Michell’s “gen- 
eral method is ... better applicable to this particular problem than that 
given above” [p. 138]. 

In his discussion of the simple test of significance R.A. Fisher [1956/1973] 
wrote 


5.4 John Michell 99 


I find the details of Michell’s calculation obscure, and suggest 
the following argument. [1978, p. 41] 


His reasoning runs as follows: take the fraction of the celestial sphere that 
lies within a circle of radius a minutes to be 


a. <2 

oo (aur) : 
Thus, on taking a to be 49 minutes (the number of minutes from Maia to 
its fifth nearest neighbour, Atlas), we get 


7 boa. a 
P= \ 740.316) ~ 19,689” 


Recalling that Michell considered 1,500 stars to be of the required mag- 
nitude, we find that of the 1,499 stars remaining (other than Maia), the 
expected number lying within this distance is 

1,499 1 


Wg 690. 18 sds 


The frequency with which 5 stars fall in the stated area is then given 
approximately by e~™m°/5!, which is roughly 1 in 50,000,000. 


Michell arrived at a chance of only 1 in 500,000 but the higher 
probability obtained by the calculations indicated above is am- 
ply low enough to exclude at a high level of significance any 
theory involving a random distribution. [Fisher 1973, p. 42] 


Michell’s astronomical work cannot be too highly appreciated. Hardin 
[1966] writes 


It was Michell’s merit to have been one of the first to concern 
himself with the physical characteristics of the stars, and to 
have made the first application of statistics to the distribution 
of the stars in space. [p. 35] 


For further comments on Michell’s work**, and the remarks of Forbes, 
Herschel and Donkin, the reader may be referred to Jevons [1877], where 
Michell’s investigations are described as “admirable speculations” [p. 212] 
and where it is noted that “The conclusions of Michell have been entirely 
verified by the discovery that many double stars are connected by gravita- 
tion” [pp. 247-248]: Jevons also concludes that any error there may be in 
Michell’s work lies in his methods of calculation and “not in the general 
validity of his reasoning and conclusions” [p. 248]. Proctor [1872, pp. 314- 
316] discusses a similar problem, as does Bertrand*® [1907, art. 135], while 
Porter [1986, p. 79] proffers some general comments on Donkin, Forbes and 
Herschel. Venn [1888, chap. XX, §§21—23] is also pertinent, as are Hailperin 
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[1986, §6.1] and Sheynin [1984, §5]. The question also received some con- 
sideration in Poincaré [1912]. 

In 1888 a detailed investigation was undertaken by Lupton of the ar- 
guments of Michell, Struve and Forbes, it being concluded that the lat- 
ter’s methods were the least open to objection. Kleiber [1887], [1888] dis- 
sented sharply from this view, finding on the contrary that Forbes’s ex- 
periments in fact supported Michell’s argument: Lupton was not altogether 
convinced, as his further letter of 1888 showed. Keynes [1921], in a careful 
discussion, found that “Michell’s argument owes more, perhaps, to Daniel 
Bernoulli than to Bayes” [chap. XVI, footnote to §11] and concluded further 
(chap. XXV] that Michell’s argument was in part invalid and elsewhere less 
conclusive than he had supposed. An excellent modern discussion is pro- 
vided by Gower [1982]*°. 

Before we leave Michell’s essay, it might be of interest more closely to 
examine some of the alternative formulae proposed. There can be no doubt 
that Michell’s formula is wrong: as Gower [1982, p. 148] has pointed out, 
the probability found does not reduce, as it should, to zero for n > 138,131. 
The error clearly arises from the tacit assumption that the events whose 
probabilities are multiplied together are independent, whereas in fact the 
event that star A is more than one degree from any other star is not inde- 
pendent of the event that star B is more than one degree from any other 
star. 

Turning next to Struve’s work, we recall that he found 


uel Pr [any binary pair falls in a small circle of area s] 


(o)P 


where p= s/S. It thus follows that 


tw, =  Pr{no binary pair falls in a small circle of area s] 


(3). 


Forbes’s argument, on the other hand, yields 


l| 


m3; =  Prfall dice show different faces] 
=  Pr[no two stars are in the same small circle] 
= v(v—1)...v—n41)/v", 


and thus 
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Pr (at least two dice show the same face] 


14 


= Pr[at least two stars are in the same small circle] 


(| 


1—v!/[(v —n)l vu"). 


Using Michell’s figures for @ Capricorni, and (9) where necessary, one 
obtains 


™, = 6.189766706 x 1073 
T2 = 9.938102333 x 107! 
m3 = 9.937429075 x 107! 
tT, = 6.257092500 x 107°. 


A comparison of 72 and 73 (or 7, and 74) shows that, even though the nu- 
merical values are markedly similar, these probabilities are in fact answers 
to different questions. That the numerical answers coincide is a consequence 
of the fact that, for large n and very much larger p (= 1/v), 


ee eee re 


2) y 2v 
v! uA n(2n — 1) 
(v — n)! vu" - Qu 


In conclusion, let us see whether Michell is in fact guilty of some of the 
charges levelled against him. Recall that the method he advocated con- 
sisted of two parts, viz. 


(i) the inferring of design, or some general law, from a general analogy, 
and 

(ii) the greatness of the odds against things having been in the present 
situation, were it not for some such cause. 


If one denotes by D the event that a certain group of stars (e.g. those in 
@ Capricorni or the Pleiades) has a certain physical distribution, and by R 
the event that the stars are randomly scattered, then one sees that Michell 
has in each of his examples calculated Pr[D | R]. Further, in the case of ( 
Capricorni he states 


If we now compute ... what the probability is, that no two 
stars ... should have been within so small a distance from each 
other, as the two stars @ Capricorni, ... we shall find it to be 
about 80 to 1 [p. 246] 


while in that of the Pleiades he writes 
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we shall find the odds to be near 500 000 to 1, that no six 
stars, ... scattered at random, ... would be within so small a 
distance from each other as the Pleiades are. [p. 246] 


Thus “odds” and “probability” are used in an apparently synonymous man- 
ner. 

What Michell is in fact concluding, then, is that Pr[R | D] is large, or 
equivalently that Pr[R| D] is small (cf. Hailperin [1986. p. 356]). Since 


Pri) Pri la) Prigh) Pri] (10) 


and since Pr[D | R] has been found to be small (1/80 for @ Capricorni, and 
1/496000 for the Pleiades), it is “clear” that Pr|R | D] will indeed be small 
— provided, of course, that the other terms in (10) are of appropriate size. 
Thus Michell has clearly made use of part (ii) of his method. 

As regards part (i), notice that, after considering in detail the cases of 
@ Capricorni and the Pleiades, Michell writes 


If, besides these examples that are obvious to the naked eye, we 
extend the same argument to the smaller stars, as well those 
that are collected together in clusters, such for example, as the 
Precepe Cancri, the nebula in the hilt of Perseus’s sword, &c. 
as to those stars, which appear double, treble, &c. when seen 
through telescopes, we shall find it still infinitely more conclu- 
sive, both in the particular instances, and in the general analogy, 
arising from the frequency of them. [pp. 247-249] 


This “analogy” argument may perhaps also be seen as being implied by 
the long quotation given above from pages 249-250 of Michell’s memoir. 


5.5 Nicolas de Beguelin (1714-1789) 


The only memoir by Beguelin*’ that has any bearing on our subject (and 
that bearing, let it be admitted, is but slight) is entitled Sur l’usage du 
principe de la raison suffisante dans le calcul des probabilités, a memoir 
published in the volume of 1767 of the Histoire de l’Académie royale des 
Sctences et Belles-Lettres, Berlin (published in 1769), pp. 382-412. 

In a reference to an earlier memoir*® Beguelin stresses the importance 
that prior information has in probability calculations: 


j’ai montré dans un Mémoire précédent que la doctrine des 
probabilités étoit uniquement fondée sur le principe de la raison 
suffisante; il ne seroit donc pas surprenant que les Mathéma- 
ticians ne suffent pas d’accord entr’eux dans la solution des 
problemes qui ont la probabilité pour objet; leurs calculs sont 
de vérité nécessaire, mais la nature du sujet auquel ils les ap- 
pliquent ne lest pas. Les vérités contingentes ne peuvent etre 
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démontrées qu’en partant d’une supposition; & quelque plausi- 
ble qu’une supposition est, elle n’en exclut pas nécessairement 
d’autres, qui peuvent servir de base 4 d’autres calculs, & donner 
par conséquent des resultats différents. [p. 382] 


He goes on next to distinguish between the possibility and the probability 
of an event: 


toute combinaison qui n’implique pas contradiction est possi- 
ble, & comme on ne sauroit impliquer a demi, toutes les combi- 
naisons possibles sont également possibles; ce n’est qu’impropre- 
ment qu’on diroit d’un événement possible, qu’il est plus ou 
moins possible qu’un autre; il n’y a point de milieu, ni de 
degrés & concevoir, entre ce qui peut exister, & ce qui répugne 
a l’existence. Mais la simple possibilité ne suffit pas pour don- 
ner l’existence a un événement; il faut de plus qu’il y ait une 
raison suffisante qui détermine |’événement a étre plutot celui 
qu’il est, qu’un des autres également possibles: & c’est ici que 
commence la probabilité. [p. 383] 


Then follows a clear definition of “sufficient reason”, viz. 


la raison suffisante de la probabilité d’un événement, c’est la 
preponderance des raisons de s’attendre a cet événement sur 
celles de s’attendre a ]’événement contraire. [p. 383] 


Todhunter is perhaps a little harsh in writing “the memoir does not 
appear of any value whatever” [1865, art. 616]: certainly the emphasis on 
the bearing of prior knowledge on probability calculations is important, 
though little else seems relevant here. 


5.6 Joseph Louis de la Grange (1736-1813) 


Of this famous mathematician’s many writings, the only one at all pertinent 
to our subject is his first memoir on probability, viz. Mémotre sur Vutilité de 
la méthode de prendre le milieu entre les résultats de plusiers observations, 
dans lequel on examine les avantages de cette méthode par le calcul des 
probabilités, et ou l’on résout différents problémes relatifs a cette matteére. 
This was published*? in volume 5 of the Miscellanea Taurinensia (1770- 
1773), pp. 167-232. Todhunter [1865] remarks on the merit of this memoir 
in the following words: 


The memoir at the time of its appearance must have been ex- 
tremely valuable and interesting, as being devoted to a most 
important subject; and even now it may be read with advan- 
tage. fart. 556] 
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Of the ten problems”? considered in this memoir, the sixth is pertinent to 
our work. Because it is both an early example in “inverse probability” and 
a precursor of Pearson’s important investigations of the (P, x?) problem”, 
we have chosen to discuss the question in some detail. The problem is posed 
by Lagrange as follows®?: 


Je suppose qu’on ait vérifié un instrument quelconque, et qu’ayant 
réitéré plusiers fois la méme vérification on ait trouvé différentes 
erreurs, dont chacune se trouve répétée un certain nombre de 
fois; on demande quelle est l’erreur qu’1l faudra prendre pour la 
correction de l’instrument. [p. 200] 


Supposing errors p,q,r,... to be made a, f,7,... times respectively in n 
observations, Lagrange assumes the unknown frequencies to be a, ,c,... , 
and considers the polynomial (ax? + br? + cz” +---)", with general term 
N(azx?)*(bx!)? (ca”)?... Now the coefficient Na%b’c? ... of ePatdetryt 
divided by (a+b+c+---)” gives the probability that the errors p,q,r,... 
will be found together in such a way that p occurs a times, g £ times, 
r vy times, &c. From an earlier problem (viz. the fifth) it is known that 
N=n!/(a! 6! y!...). The most probable value is then (correctly) taken to 
be the highest term in the multinomial, which yields 


7 na ae nb _ ne 
eta © Oe ene!  quenengceeas 
from which the unknowns a, b,c,... may be determined. Again by Problem 


V, it follows that the correction to be made is (ap+@q+yr+---)/n, “c’est- 
a-dire égale a l’erreur moyenne entre toutes les erreurs particuliéres que les 
n vérifications ont données” [p. 201]. 

Now, as Pearson has noted**, the a, 3, 7,... that give the maximum term 
in the multinomial are taken by Lagrange as being the observed a, 8,y,...: 
this may well be reasonable, but no discussion of the point is essayed. 

Following a corollary (which does not concern us at the moment) may 
be found two Remargques, in which Lagrange turns to a problem of inverse 
probability°*. These remarks Todhunter dismisses as follows: 


Lagrange proposes further to estimate the probability that the 
values of a,b,c,... thus determined from observation do not 
differ from the true values by more than assigned quantities. 
This is an investigation of a different character from the others 
in the memoir; it belongs to what is usually called the theory 
of inverse probability, and is a difficult problem. 

Lagrange finds the analytical difficulties too great to be over- 
come; and he is obliged to be content with a rude approxima- 
tion. fart. 562] 


Condemning Todhunter for his myopia, Pearson [1978, p. 599] notes that 
Lagrange came “within an ace” of solving the (P,x*) problem, a tough 
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nut cracked by Pearson himself®® in 1900. However, one might plead in 
mitigation that Todhunter was writing a history, and not a statistical text. 
Thus while he was perhaps a little brusque in his dismissal of what has 
proved to be statistically feracious, it is a bit harsh to judge him for lacking 
the foresight to appreciate its value. 

As the Normal distribution®® was reached by de Moivre as a limit to 
the skew binomial®’ in 1733 so, using the multinomial, Lagrange arrived at 
the multivariate Normal distribution. Let us examine the derivation. The 
problem posed is the following: 


. on voulait savoir de plus quelle est la probabilité que ces 
mémes valeurs [viz. a,b,c,... | ne s’écarteront pas de la vérité 
d’une quantité quelconque +(rs/n) [p. 202] 


where s=a+6+c+--- . (Notice that the true values are now assumed 
unknown.) 
Noting that a,b,c,... are proportional to a, #,y,... only when one is 


working with the most probable value of the multinomial, Lagrange con- 
siders now 


s(a + 2) s(6 + y) ~— M1 +2) 


Se ee ee, oF b oa. Ne Oe oC 
n n n 
taking z,y,z,... equal to +1,+2,...,+r successively, subject to the con- 
straint that 
ie 
since, by hypothesis, 
at+tB+yt+---=n and a+6+c+:--=s. 
If P is the probability that a = sa/n, b = sB/n, c = sy/n,... , then 
substitution of these values in an earlier result (Problem V) yields 
n! a® BP 
ata 


Similarly, if Q is the probability that one has 


_@+2)  , 684+) sty) 
then 
Q = P(i+2/a)*(1 + y/B)8(142/y)7... 


= PV, say. 
The desired probability will then be P [V. 
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Noting the difficulty of evaluating this integral in general, Lagrange 
remarks that it can be evaluated by multiplying the mean value of V by 
the number of all the values of V entering into the integral, “et la difficulté 
ne consistera qu’a trouver ce nombre” [p. 203]. Denoting by m the number 
of the quantities a, @,y,..., he points out that the number required will 
be the coefficient T' of u° in the expansion of 


(uP uth put po tul yee fur pury™ | 
whence, in fact [p. 203], 


pS ee) 


1.2.3...(m—1) 
1.2.3...(m—1) 
phe ee ele ee)... 
2 1.234.000 = 1) 


If W denotes the mean value of V, then [V is to be approximated by TW, 
and the desired probability is then approximately PTW. 

If, however, one were to take the smallest value of V, rather than the 
mean value W, one would necessarily underestimate the true value of [V, 
and hence the desired probability. Thus one may advantageously wager 
PTW to 1— PTW that in taking 


one does not make a mistake of an amount greater in absolute value than 
r/n. 

In his Remarque II, Lagrange essentially “passes to the limit”: that is, 
he supposes n (and consequently a, 6,7,... ) to be very large. Proceeding 
from what is essentially the Stirling-de Moivre theorem, he deduces that 


1.2.3...u / TU 


uu eu 


His “mz” being what one would nowadays call “272”, we shall change to the 
modern notation. It follows that 


n! a® 6° 
n™ at Blo" 


27m 


(27a)(27B)(Qry)... 
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Turning next to the expression V Lagrange shows that 


£ z 
logV = alog( + =) + Blog(1 + 4) +rlog(1+—) ++ 
r y z (= y? zt ) 
= gt pe ty- pews (— 45g gs 
rg ty aa taty 
1/2 yy 2 
+3(S+G+5t Jee (11) 
eee eee ae 
— 2a poy 
since r-+y+z+---=0, and the cubic term in (11) above (given by Pearson 


[1978, p. 600] but not by Lagrange) is negligible in comparison with the 
quadratic. 

On defining x = fn, y= n/n, z=CYn,... anda/n= A, B/n = B, 
y/n=C,... , one deduces that €+n+¢€4+---=0,A4+B4+C+4+-:-=1, 
and 


PV = [(2mn)™-!ABC....]77 exp[-L(C2/A+1°/B+C2/C+---)]. 


Now, when the increment or the difference of the quantities z, y,z,... 1s 1, 
the difference of the variables €,7,¢,... will be 1/,/n (and hence infinitely 
small). Denoting this difference by d@, one will have 


PV 21m ABC .. ‘like exp[-1(€7/A+7°/B+C?7/C+---)] do™* . 


This result, incidentally, Pearson [1978, p. 600] finds “extraordinarily bril- 
liant”, in particular for the following reasons®® 


(1) a measure of the terms we are neglecting; 


(ii) it deduces the probability that the true values differ from 
the observed values and not the inverse relation; 


(iii) it involves precisely the P and the y? that I obtained by 
a most troublesome algebraic process in 1900. 


Lagrange next turns his attention to the (m — 1)-fold integration of 
exp[—4(€?/A+n?/B+¢?/C+---)]d0™~1, and takes note that there are only 
m—1 independent variables, which results in his substituting for €, —-y7—¢ — 
... The solution of the general problem being only obtainable by tables®®, 
Lagrange restricts his attention to the case in which only two errors are 
present. Pearson [1978, p. 602] has pointed out that certain numerical errors 
present in this discussion suggest that Lagrange copied de Moivre’s results 
in places. Nevertheless, the right answer for the approximate evaluation of 


P = (2nABn)~? exp(—+ €°/AB) 
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is obtained — viz. 0.682688. The section concludes with further discussion, 
not relevant to the present study, of the multivariate case. 


0.7 William Emerson (1701-1782) 


In 1776 a treatise entitled Miscellanies, or a Miscellaneous Treatise; con- 
taining several Mathematical Subjects, and published by J. Nourse of Lon- 
don, appeared under the name of Emerson®. 

The first article [pp. 1-48] of this treatise is devoted to the laws of chance. 
The treatment is fairly standard: indeed, one must agree with Todhunter 


[1865] that 


There is nothing remarkable about the work except the fact 
that in many cases instead of exact solutions of the problems 
Emerson gives only rude general reasoning which he considers 
may serve for approximate reasoning. [art. 641] 


In Emerson’s own words 


It may be observed, that in many of these problems, to avoid 
more intricate methods of calculation, I have contented myself 
with a more lax method of calculating, by which I only approach 
near the truth. [1776, p. 47] 


That Emerson expected criticism of his essay (perhaps even welcomed 
it) is shown by one of his introductory paragraphs, in which he writes 


Therefore my readers may please to take notice, that if any 

envious, abusive, dirty Scribbler, shall hereafter take it into his 

head to creep into a hole like an Assassin, and he lurking there 

on purpose to scandalize and rail at me; and dare not shew his 

face like a Man; I shall give myself no manner of trouble about 

such an Animal, but look upon him as even below contempt. 

[p. v] 
Harsh words, but perhaps not out of character for one who could decline 
an FUR S.° |! 

The only part of the Essay that might possibly be of interest is Arti- 

cle 1, The Laws of Chance (pp. 1-48). Here Emerson sets out the following 
definitions and axioms®*?: 


Definition I. Chance is an event, or something that happens 
without the design or direction of any agent; and is directed or 
brought about by nothing but the laws of nature. 

Def. II. The probability or improbability of an event happen- 
ing, is the judgement we form of it, by comparing the number 
of Chances there are for its happening, with the number of 
Chances for its failing. 
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Def. III. Expectation in play, is the value of a man’s Chance; 
that is, of the thing played for, considered with the probability 
of gaining it; and therefore is the product of its value multiplied 
by the probability of obtaining the prize. 

Def. IV. Risk is the value of the stake considered with the prob- 
ability of losing it; & therefore is the product of its value mul- 
tiplied by the probability of losing it. 

Def. V. Events are independent when they have no manner of 
connection with one another; or when the happening of one nei- 
ther forwards nor obstructs the happening of any other of them. 
Def. VI. An event is dependent when the probability of its hap- 
pening is altered by the happening of some other. 

Axiom I. In computing the number of Chances, it is supposed 
that all Chances are equal, or made with equal facility. 

Axiom II. The whole expectation for any prize, 1s the sum of 
all the expectations upon the particulars. 

Axiom III. The value of any Chance or expectation is what 
would purchase the like Chance or expectation, in a fair game. 


[pp. 2-3] 


5.8 George Louis Leclerc, Comte de Buffon 
(1707-1788) 


From the pen (or quill) of this distinguished naturalist®? there flowed a 
memoir entitled Essa: d’Arithmétique Morale, which work, published in 
1777, constitutes part of the Supplément to the Histoire Naturelle, Tome 
IV. Exactly when this memoir®* was written is uncertain, though Gouraud 
says 


Cet ouvrage, dont la composition remonte a 1760 environ, ne 
parut qu’en 1777 dans le tome IV du Supplement a l’Histoire 
naturelle. |1848, p. 54] 


Most of this long essay has little (if indeed any) bearing on our subject. 
However, after distinguishing three kinds of truths (viz. geometrical truths 
known by reasoning, physical truths known by experience, and truths be- 
lieved on testimony), Buffon illustrates those of the second kind by consid- 
ering the question® of the sun’s rising. Like Price, Buffon stresses that, to 
the man who has only once seen the rising and the setting of the sun, the 
second rising will be 


une premiere experiénce, qui doit produire en lui l’espérance de 
revoir le soleil, & il commence a croire qu’il pourrait revenir, 
cependant il en doute beaucoup. [1778, p. 76] 
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With the repeated returns of the sun the observer’s doubt diminishes, until 


il croira étre certain qu’il le verra toujours paroitre, disparoitre 
& se mouvoir de la méme facon. [1778, p. 77] 


Buffon then concludes that the probabilities of subsequent risings increase 
like the sequence 1,2,4,..., 2"~+, the meaning of this becoming clear only 
later in the E’ssaz, where we read 


. 213 = 8192,... & par conséquent lorsque cet effet est ar- 
rivé treize fois, il y a 8192 a parier contre 1, qu’il arrivera une 
quatorziéme fois ... [pp. 85-86] 


that is, a probability of 2”~1 is to be interpreted as odds of 2"~! to 1 in 
favour of the event in question®®. 

As a numerical example, it is supposed that the age of the earth is 6,000 
years, with leap years being neglected. Buffon then asserts that, if one 
knows that the sun has risen 2,190,000 times, the probability of its rising 
once more is 27:18%:999 (or, as we have seen, 27:18%999 to 1). This is plainly 
inconsistent with Laplace’s expression (n+ 1)/(n +2), though, as Sheynin 
[1969] and Zabell [1988a] have noted, it is more in line with that given 
by Price — if we gloss over a confusion between “number of risings” and 


“number of returns” (see §4.6). 


5.9 Jean Trembley (1749-1811) 


Only one work by this author contains matter directly pertinent to our 
topic, viz. the memoir De probabilitate causarum ab effectibus oriunda: 
disquisitio mathematica, published in volume 13 of the Commentationes 
Societatis Regiae Scientiarum Gottingensis, 1795-1798, pp. 64-119 of Com- 
mentationes mathematica® (published 1799). 

The scope of this work is clearly delineated in the opening paragraph®®: 


Hanc materiam pertractarunt eximii Geometrae, ac potissimum 
Cel. la Place in Commentariis Academiae Parisinensis. Cum 
autem in hujusce generis Problematibus solvendis sublimior et 
ardua analysis fuerit adhibita, easdem quaestiones methodo 
elementari ac idoneo usu doctrinae serierum aggredi operae 
pretium duxi. Qua ratione haec altera pars calculi Probabil- 
ium ad theoriam combinationum reduceretur, sicut et primam 
reduxi in dissertatione ad Regiam Societatem transmissa. Pri- 
marias quaestiones hic breviter attingere conabor, methodo di- 
lucidandae imprimis intentus [§1] 


— though as Todhunter has noted, the claims of “lucidness” and “rigour” 
are perhaps a little exaggerated®’. 
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The first problem Trembley’ considers is the following: let there be an 
urn containing an infinite number of white and black balls’? in unknown 
proportion. Let p white and q black balls be withdrawn from the urn: we 
seek the probability of drawing m white and n black balls in future drawings 
(all drawings being made with replacement). The solution to this problem 
is, as we shall see in our chapter on Laplace, given by 


1 1 
bas / e™tP(1 —2)"t4 dr /{ z’(l—a«)' dz, 


though Trembley does not give his solution in this form. 
After discussing the problem thus far, Todhunter goes on to say 


the investigations are only approximate, the error being however 
inappreciable when the number of balls is infinite. If each ball 
is replaced after being drawn we can obtain an ezact solution 
of the problem by ordinary Algebra ... and of course if the 
number of balls is supposed infinite it will be indifferent whether 
we replace each ball or not, so that we obtain indirectly an 
exact elementary demonstration of the important result which 
Trembley establishes approximately. [art. 766] 


It seems to me that Todhunter has missed, in the original, the sentence 
“Schedulae eductae supponuntur rursus conicil in vas” — or is the empha- 
sis merely on an expert use of algebra to solve the problem? 

Certain other problems, involving balls and urns, are considered by 
Trembley: in each case, however, he relates them to work by Laplace, and 
we shall therefore postpone consideration of Trembley’s transcriptions to 
the appropriate place in Chapter 7. The treatment of the Problem of Points, 
considered by Laplace in his Mémoire sur la probabilité des causes par les 
événemens, is extended slightly by Trembley: to this we shall likewise re- 
turn. 

His preceding discussion, Trembley states, leads to the conclusion that 
the probability of causes, generated by effects, requires a method that con- 
sists of two parts’?: 


In prima parte assignantur formulae quae repraesentant hanc 
Probabilitatem; in altera parte indicantur approximationes quae 
possibilem reddant usum harum formularum ubi ingentes adsunt 
numeri. [§14] 


The example (again one from Laplace) adduced to illustrate this assertion 
is that concerning the observed difference between the ratio of the number 
of boys born to the number of girls born (in a certain time period) in 
London, and the similar ratio in Paris. As we shall see in the discussion on 
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Laplace, one is led to consideration of the ratio 
1 £ ; : 
ij / xP(1—ax)ie’? (1—2')! de' dz 
£0 JV r/=0 


i pi 
/ / zP(1 — x)¢x'P'(1 — x')9' dx! dr 
2=0 7 a'=0 


which Trembley evaluates by expansion of the integrands and term-by- 
term integration: an alternative way of reaching his final result is given by 
Todhunter (1865, art. 773]. Using beta-functions and the fact that 


T(a+1M(b+1) ft, fs ae 
ee @), *eee =P) 


) 


one can in fact show that the above ratio of integrals is 


oe. S(-1) Bip+qt2,r+j+)) 
(ri) (e tl Bip+tr+1,j4+1) B(s—jti,r+j42) ’ 


where ‘Trembley’s p’ and q’ have been replaced, for convenience, by r and 
s respectively. 


5.10 Pierre Prevost (1751-1839) & 
Simon Antoine Jean Lhuilier (1750-1840) 


There are three memoirs by these authors that have some bearing on 
our subject. The first of these, and, of the three, the only technical one, 
is entitled Sur les probabilités. It occupies pp. 117-142 of the Classe de 
Mathématique of the Mémoires de l’Académie royale des Sciences et Belles- 
Lettres, Berlin 1796 (published 1799), and was read before the Academy 
on the 12th November 1795. 

In this essay Prevost and Lhuilier propose to consider the following prob- 
lem: 


Soit une urne contenant des billets de deux espéces (que j’appell- 
erai blancs et noirs), dans un rapport inconnu. Soit tiré succes- 
sivement un certain nombre de ces billets, sans remettre dans 
Vurne, a chaque extraction, le billet tiré. Connoissant le nom- 
bre des billets de chaque espéce qui ont été tirés, on demande la 
probabilité que tirant de la méme maniere de nouveaux billets, 
en nombre donné, il y a en aura des nombres donnés de ces 
deux espéces. [p. 117] 
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As Todhunter [1865, art. 849] has noted, this memoir is the first in which 
the urn-sampling problem when the balls sampled are not replaced’, is 
considered. 

The solution of this problem requires the following principle: 


Principe étiologique. Si un événement peut étre produit par un 
nombre n de causes différentes, les probabilités de l’existence 
de ces causes prises de |’événement, sont entr’elles comme les 
probabilités de |’événement prises de ces causes. Et (par conse- 
quent) la probabilité de l’existence de chacune d’elles est égale 
a la probabilité de |’événement prise de cette cause, divisée par 
la somme de toutes les probabilités de l’événement prises de 
chacune de ces causes. [p. 125] 


This principle, “fécond en consequences” [p. 125], is copied verbatim from 
Laplace’s memoir of 1774, though for once an appropriate reference as to 
the source is made in the memoir itself. We shall postpone discussion of 
this principle to the chapter on Laplace. 

The perhaps slightly general statement of the problem as initially posed 
is now refined as follows: 


Probleme. Soit une urne contenant un nombre n de billets; on 
a tiré p+ q billets, dont p sont blancs & q non-blancs (que 
j’appellerai noirs). On demande les probabilités que les billets 
blancs & les billets noirs de ’urne étoient des nombres données, 
dans la supposition qu’a chaque tirage on n’a pas remis dans 
Purne le billet tiré [p. 126] 


and this in turn is further sharpened to 


Probléme. Tout étant posé comme dans le §4 [i.e. the preceding 
version}. On demande les probabilités d’amener dans un nombre 
donné r de nouveaux tirages faits de la méme maniere, des 
nombres donnés r—m, & m de billets blancs & noirs. [p. 129] 


Immediately following this last problem is the principle of solution; the 
probabilities of the event sought, corresponding to assumptions as to its 
causes, are made up in proportion to the probabilities of these causes and to 
the probabilities of the event depending on these causes, the probability of 
the event being the sum of these probabilities (clearly the principle follows 
from the Principe étiologique mentioned above). 

All solutions are given in product form: full details may be found in 
Todhunter (1865, art. 843]. All we shall do here, to give the flavour of the 


original presentation, is to present the récapitulation of §7, viz. 


On a tiré d’urne p billets blancs, & gq billets noirs, en ne remet- 
tant dans l’urne a aucun des tirages le billet extrait. On tiré 
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de nouveau r billets de la méme maniére. On obtient les ex- 
pressions suivantes des probabilités que les nombres des billets 
blancs & noirs seront comme il suit. 


Nombres des Probabilités 
billets blancs ... billets noirs 
‘ 0 1 ptl.p+2.... p+r 


. pt+qt2.ptqt+s3. ...p+q+r4+l1 


cn : r pti pt2. ...ptr-19+) 
1 “ptq+2.p+q+3. ... p+q+r+1 
Hid 9 r.r—l . ptl.p+2....ptr—2.q+1.q+2 
- 12 pt+q+2.p+qt+3....ptgtrt+l 
C. 


It is clear from this that the desired probability of drawing r white and s 
black balls can be expressed, more compactly, as 


rl (ptr—s)! (gts)! (tat) 


s!(r—s)! p! q! (p+q+r+1)! 


ars re F 1 eae 


an expression that the authors note, in their ninth section, is independent 
of the number of balls initially in the urn. 

So far there is little, if indeed anything, that seems pertinent to our 
work. However, the authors go on to point out that the conclusion noted 
at the end of the preceding paragraph will not hold if sampling is effected 
with replacement. They state that a future memoir would consider this 
latter problem when the number of balls is infinite, but such observations 
apparently did not see the light of day. However Todhunter has considered 
the possible contents of such a memoir, and his thoughts run as follows (we 
present them here as an interesting example of a non-futile speculation): 
suppose that, from an urn with an infinite number of balls, p white and g 
black are chosen (without replacement). The probability that the next r+s 


draws will result in r white and s black is then, by the Laplace theorems, 


1 1 
ag i) xP tT (1 —~ax)it* dr /| z?(1— 2x)! dz , 
eae 0 


evaluation of which results in the answer given above for the finite case. The 
coincidence appears to Todhunter to be “remarkable” [art. 847]: but when 
we consider that the result for the finite case is independent of the number 
m of balls initially in the urn, should we not expect the same answer to 
hold “in the limit as m — oo”, so to speak? 


or 
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The remaining two memoirs, which were published in the same volume of 
the Mémoires de l’Académie royale des Sciences et des Belles-Lettres, being 
less mathematical in nature, are published in the Classe de Philosophie 
Spéculative, the second memoir occupying pages 3-25, and the third pages 
25-41. 

The second memoir, entitled Mémozre sur l’art d’estimer la probabilité 
des causes par les effets, is divided into two sections, of which only the first 
(Des principes de cette partie de l’art de conjecturer) need be considered 
here (the second part, Précis de la marche des applications, consists of 
some simple applications of the principle propounded in the first part to 
some die problems)”®. 

Two early definitions, given at the start of the first section of this, the 
second, memoir, are, I think, of interest. They are the following: 


La Stochastique, ou l’art de conjecturer avec rigeur, ayant en 
pour premier objet d’estimer les hasards du jeu, est fondée sur 
des principes relatifs a cette origin. [p. 3] 

La Stochastique entiere repose sur cette hypothese que je vais 
maintenant énoncer sous une forme plus générale. Hypotheése 
Stochastique. Lorsqu’en vertu d’une certaine détermination des 
causes, plusiers événemens nous paroissent également possibles; 
nous feignons que tous ces événemens ont lieu successivement 
tour-a-tour & sans répétition. [p. 6] 


Here we find strongly stated the opinion that “la stochastique” (dare we 
translate this by the archaic noun “stochastic”?) has, as its fons et origo 
(and also its prime purpose), games of chance. The “hypothése stochas- 
tique” is also of interest, stating as it does that a judgement of equi- 
possibility is, in a sense, basic, and that it is on the grounds of such a 
judgement that we suppose events occur in turn and without repetition. 

This is one of the few French papers in which reference to earlier authors 
is specifically made. We read further in the memoir, in fact, 


MM. JAC. BERNOULLI, MOYVRE, BAYES & PRICE ont 
successivement appliqué le calcul ala recherche des causes. Mais 
le principe sur lequel repose la justesse de leurs résultats, n’étant 
pas énoncé, laisse un vide qui nuit 4 la clarté: & ce défaut, 
trés-sensible a tout lecteur attentif, a rendu timides ces auteurs 
mémes; en sorte que leurs résultats n’ont mi |’étendu ni !’utilité 
qu’ils auroient pu leur donner. Et si une sage défiance les a 
garantis de l’erreur, |’incertitude de leur marche a laissé des 
hasards a courir a ceux qui tenteroient des les suivres. §9. M. 
de la Place le premier a posé disertement le principes sur lequel 
repose toute cette partie de la théorie des probabilités. Voice 
comme il l’a énoncé: 


116 5 Miscellaneous Investigations: 1761 to 1822 


Principe. Si un événement peut étre produit par un nombre n 
de causes différentes, les probabilités de l’existence de ces causes 
prises de |’événement, sont entre elles comme les probabilités de 
l’événement prises de ces causes. [p. 8] 


Here I believe Prevost and Lhuilier are unjust to Bayes: it is, I trust, quite 
clear from what has already been said that his presentation and solution 
of the problem were perfectly satisfactory. On Price they are perhaps more 
correct, while their opinions on Bernoulli and de Moivre do not concern 
us. They are, however, quite correct in attributing to Laplace the first 
announcement of the principle. 

The authors then restate this fundamental principle as their principe 
étiologique (in a slightly different form to that given in the first memoir). 
After that, we read“ 


Tel est le principe reconnu par M. de la Place, lequel a rendu 
claire & stire l’estimation de la probabilité des causes par les 
effets, & que, par cette raison, j’al cru devoir appeler Principe 
étiologique. [p. 8] 


Prevost and Lhuilier now prove Laplace’s principle (their statement of the 
principe étiologique here is framed in terms of dice-throwing), and deduce 
the discrete “Bayes’s theorem” from it. 

The third memoir is entitled Remarques sur Vutilité & Vétendue du 
principe par lequel on estime la probabilité des causes, and it also deals 
with Laplace’s fundamental principle. Again there is a reference to Bayes 
— as Bayer! The first section is on the utility of the principle, the second 
on its extent, and the third on the comparison of some results of the (prob- 
ability) calculus to the judgements of common sense. Of interest to us 1s 
the start of Section 19: 


Enfin la théorie de l’estimation des probabilités a posterior: 
fournit une conséquence nouvelle & remarquable: c’est que 
Vhypothese de ignorance des causes, & |’hypotheése de la con- 
naissance: de leur nature, ne donnent la mémes résultats que 
dans le cas ot on estime une probabilité simple, 


this being illustrated by a die-tossing example. The fourth, and final, section 
is devoted to some mathematical developments. 


5.11 Carl Friedrich Gauss (1777-1855) 


Gauss’s works, although legendary, contain relatively little pertinent to 
our topic’”. Indeed the relevant writings are limited to two: an 1815 review 
of Laplace’s Sur les cométes and a passage from the 1809 opus Theorta 
Motus Corporum Coelestium in Secttonibus Conicis Solem Ambtentium. 
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The former will be considered in §7.13; we turn our attention immediately 
to the latter. 

In Article 176 of the third Section of the second Book Gauss” cites the 
following result: 


Si posita hypothesi aliqua H probabilitas alicuius eventus de- 
terminati & est = h, posita autem hypothesi alia H’ illam exclu- 
dente et per se aeque probabili e1usdem eventus probabilitas est 
= h': tum dico, quando eventus # revera apparuerit, probabil- 
itatem, quod H fuerit vera hypothesis, fore ad probabilitatem, 
quod H’ fuerit hypothesis vera, ut h ad h’. 


That is, Pr[H | E]/Pr[H’| FE) = Pr[E& | A]/Pr(F | A’) under the assump- 
tion that Pr{[H] = Pr[H’]. Arguing from numbers of equally-likely cases 
Gauss demonstrates this theorem, and goes on to apply it in the following 
case: suppose there are ys (> v) functions®® V,V',V”... of the vy unknown 
quantities p,q,r,s,... . Suppose further that the values of the functions 
found by direct observation are V = M, V’ = M’, V" = M" etc. Express- 
ing by y(M —V) the probability that observation yields the value M for 
V, and substituting in V a determinate system of values for p,q,r,5,... , 
we find, under the assumption of independent observations, that the prob- 
ability (“or expectation”) that all these values will result together from 
observation 1s 


Q=(M —V)9(M! -V’) o(M"—-V") ... 


Using the theorem cited above one finds that®* 


Pr[p< P<p+dp,q<Q<q4+dqQ,...|.V=M,V'=M",...] 


= AQdpdq 
where 1/A= f°) --- f° Qdpdq --- . This result of course obtains under 
the assumption that “omnia systemata valorum harum incognitarum ante 
illas observationes aeque probabilia fuisse” [art. 176]. 
Gauss now concludes that the most probable system of values of the 
quantities p,q,7r,s, etc. is that which maximizes (2, whence he deduces that 
the probability to be assigned to an error A should be given by 


I. a x8 
l= —° * 


h being “considered as the measure of precision of the observations” (Davis 
[1857, p. 259]). 


5.12 William Morgan (1750-1833) 


William Morgan, a nephew of Richard Price®’, was by profession an actu- 
ary, and contributed himself nothing to our subject — although his 1783 
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paper “Probability of Survivorship” was excellent enough to win him the 
gold medal of the Royal Society, and a fellowship followed soon thereafter. 

However Morgan also wrote a small monograph bearing the title “Mem- 
oirs of the Life of the Rev. Richard Price, D.D. F.R.S.” in which reference 
was made to Price’s involvement with Bayes’s Essay. William had not in- 
tended to write this memoir: in his foreword he in fact states that his 
brother George®? had 


undertaken to write a very circumstantial history of his uncle’s 
life, and had made a considerable progress in it, when, towards 
the close of the year 1798, a fatal disorder put a final period to 
this and all his other pursuits. 

The confused state in which his papers were found, and the 
indistinct short hand in which they were written, rendered it 
impossible either to arrange or to understand them properly; 
and therefore, after many fruitless attempts, I was reluctantly 
obliged to give up the investigation, and to take upon myself 
the task of writing a new, but more concise account ... 

[1815, pp. vi—vii]. 


The role Richard Price played in communicating Bayes’s Essay to the Royal 
Society is succinctly summarised as follows by Morgan (the quotation is 
long, but I think worthy of inclusion). 


On the death of his friend Mr. Bayes of Tunbridge Wells in the 
year 1761, he was requested by the relatives of that truly inge- 
nious man, to examine the papers which he had written on dif- 
ferent subjects, and which his own modesty would never suffer 
him to make public. Among these Mr. Price found an imperfect 
solution of one of the most difficult problems in the doctrine of 
chances, for “determining from the number of times in which 
an unknown event has happened and failed, the chance that the 
probability of its happening in a single trial lies somewhere be- 
tween any two degrees of probability that can be named.” The 
important purposes to which this problem might be applied, 
induced him to undertake the task of completing Mr. Bayes’s 
solution; but at this period of his life, conceiving his duty to 
require that he should be very sparing of the time which he 
allotted to any other studies than those immediately connected 
with his profession as a dissenting minister, he proceeded very 
slowly with the investigation, and did not finish it till after two 
years; when it was presented by Mr. Canton to the Royal Soci- 
ety, and published in their Transactions in 1763. 

— Having sent a copy of his paper to Dr. Franklin, who was then 
in America, he had the satisfaction of witnessing its insertion 
the following year in the American Philosophical Transactions®*. 


5.13 Sylvestre Francois Lacroix 119 


— But not withstanding the pains he had taken with the so- 
lution of this problem, Mr. Price still found reason to be dis- 
satisfied with it, and in consequence added a supplement to 
his former paper; which being in like manner presented by Mr. 
Canton to the Royal Society, was published in the Philosophical 
Transactions in the year 1764. In a note to his Dissertation on 
Miracles, he has availed himself of this problem to confute an 
argument of Mr. Hume against the evidence of testimony when 
compared with the regard due to experience; and it is certain 
that it might be applied to other subjects no less interesting 
and important. By these two communications to the Royal So- 
ciety, Mr. Price had proved himself not unworthy the honour 
of being admitted a member of that learned body, and he was 
accordingly elected in a few months after the publication of his 
second paper. [1815, pp. 24-27] 


5.13 Sylvestre Francois Lacroix (1765-1843) 


In his Traité Elémentaire du Calcul des Probabilités [1816]®> Lacroix has 
86. 


this to say on the probability of causes°?: 
C’est ainsi qu’on a posé pour principe que les probabilités des 
causes (ou des hypothéses) sont proportionelles aux probabilités 
que ces causes donnent pour les événemens observés. [p. 133] 


In a footnote to this passage he writes 


Cet énoncé se trouve dans le tome VI des Savans étrangers®’ , 


p. 263. Bayes, dans les Transactions philosophiques de 1763, 
et Price, dans celle de 1764 (p. 296), s’étaient déja occupés 
de ce sujet; mais M. Laplace |’a réduit le premier a la forme 
analytique sous laquelle on le traite maintenant, qui en facilite 
et en généralise beaucoup les applications. [p. 133] 


Once again it 1s doubtful whether the animadversion to Bayes as having 
been concerned with causes is correct. Other pertinent passages are the 
following: 


Enfin 1] faut remarquer encore que ces fractions, ou les proba- 
bilités des diverses hypotheses, se forment en divisant la proba- 
bilité de ’événement composé, calculée dans chaque hypothese, 
par la somme de ses probabilités dans toutes les hypotheses 
(p. 143] 


and 
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On trouverait de méme, pour tout autre example, que la proba- 
bilité d’un nouvel événement simple s’obtient en calcul, d’aprés 
les événemens passés, la probabilité des diverses hypothéses pos- 
sibles, et faisant la somme des produits de ces probabilités par 
celles de l’événement, prises dans chaque hypothése. 

[pp. 135-136] 


These statements lead, in a manner that is by now perhaps all too familiar, 
to expressions of the form 


arth — oy [af z™(1—2)" dz , 


and ; 
itp 1l1—2z n+q dx 
pp-1)...@-a+) a 8 


hes. a 1 
: f «™(1—2)" dz 
0 


where [0,1] is divided into small parts, denoted by a. 


5.14 Conclusions and Summary 


In the half-century following the publication of Bayes’s Essay, there seems 
to have been little published that might be regarded not only as pertinent 
but also as original — excluding, of course, the works of Condorcet and 
Laplace, to which we shall turn in the following chapters. 

Hard on the heels of the Essay came a paper communicated by Price to 
the Royal Society, in which Bayes’s proofs of the rules of the Essay were 
detailed and developed. Much of the refinement was due to Price himself. 

It is possible to find in Mendelssohn’s writings a precursor of Laplace’s 
rule of succession, though hindsight and charity are probably required for 
such a discovery. The expression that Buffon advances for the solution of 
similar problems bears no resemblance to Laplace’s, though it is (more or 
less) in accord with Price’s. 

It seems clear to me that the Bayes integrals to be found in some of the 
papers discussed here are in fact due to Laplace, and that a number of 
the results we have noted are but application or development of Laplace’s 
work. 

More noteworthy is the discussion we find here, by Lagrange, of a prob- 
lem in inverse probability — perhaps the first in print. This discussion 
appeared in 1770, a scant six years after publication of Bayes’s results. 
Perhaps one should consider Lagrange, rather than Bayes, as the father 
(albeit unwittingly) of inverse probability. 
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In 1774 the collected works of Guillaume Jacob ’s Gravesande appeared. 
Here, in Part II of the Introduction a la Philosophie, contenant la Meta- 
physique, et la Logique, may be found, in chapter XVII, “De la probabilité” , 
what in effect is an example of an inverse to Bernoulli’s theorem (though 
it amounts to little more than the advocating of the approximation of a 
probability by an observed frequency, and the mentioning that the error 
involved in such an approximation decreases as the number of trials in- 
creases). Since this work was apparently first printed®® in 1736, however, 
it falls outside the ambit of the present study. 


Condorcet 


The Productions of an exalted Genius are 
very liable to Misconstruction and Cavil, 
as the Subject is often clouded with some 
natural Intricacy. 


Francis Blake. 


6.1 Introduction 


Marie Jean Nicolas Caritat, Marquis de Condorcet (1734-1794) was a man 
of polymathic, if not polyhistoric, proportions. Pearson [1978] has described 
him as follows: 


there have been better mathematicians, better economists, bet- 
ter historians, better philosophers and better politicians than 
Condorcet, but scarcely any man has been at the same time 
as good a mathematician, as good an economist, as good an 
historian, as good a philosopher and as good a politician as he 
was. [p. 425] 


Of the some half-dozen writings by Condorcet considered in this chapter, 
two, a memoir and an essay, outstrip the others in importance’. Although 
the memoir was published in a number of parts (almost as separate papers) 
over a number of years, and although the essay was published during this 
period, we shall consider the former as a unit and discuss it zn toto (where 
relevant). 


6.2 Unpublished manuscripts 


The existence of two early probabilistic works by Condorcet, presently 
housed in the Bibliothéque de l’Institut de France, has been noted by Baker 
[1975, p. 4386]. The first of these, MS883, ff.216-221, was probably written 
in 1772: it contains nothing pertinent to the present study. The second, 
MS875, ff.84—99 (copy 100-109) dates from 1774, and bears the title “His- 
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toire abrigée de le calcul”. It is clear from the manuscript that the work 
was revised at some stage, and it is in one of these revisions that the only 
reference to Bayes (a reference not repeated in the fair copy) is to be found, 
to wit, 


Les principes de les calculs se trouverant dans les Transactions 
Philosophiques annee 1764 No. LIII dans différens morceaux de 
M's Bayes et Price. 


The reference that this sentence replaced was to a memoir by Laplace 
“Imprime dans le Tome VI”: this is clearly a reference to Laplace’s paper 
of 1774, and suggests that Condorcet became aware of Bayes’s work after 
the publication of this paper of Laplace’s. 

Crepel [1987] has recently pointed out that the first of the manuscripts 
mentioned above, viz. MS883, ff.216-—221, is really only the first part of a 
longer work, the second part of which, Z30, ff.1-6, is housed in the Bureau 
des Longitudes, while the third, MS875, ff.132—-133, 1s to be found in the 
Bibliotheque de l’Institut de France. An outline of the contents of these 
fragments is given by Crepel (op. cit.): it does not appear that anything 
germane to the present work is to be found there. 


6.3 The Memoir 


This memoir”, in six parts, was published in the Histoire de l’Académie 


royale des Sciences for the years 1781, 1782, 1783 & 1784, although the 
dates of publication are usually later than these dates. 

The first part of the Mémoure sur le calcul des probabilités is enti- 
tled Réflextons sur la régle générale qui prescrit de prendre pour valeur 
d’un évéenement incertain, la probabilité de cet évenement, multipliée par 
la valeur de l’évenement en lui-méme; and it occupies pp. 707-720 of the 
volume for 1781 (although it was read on the 4th August, 1784). This 
part contains nothing pertinent: the second, however, filling pp. 720-728 
of the same volume and entitled Application de l’analyse a cette question: 
Déterminer la probabilité qu’un arrangement régulier est l’effet d’une in- 
tention de le produire, contains some observations that are at least slightly 
relevant. 

The first noteworthy detail concerns n possible combinations, of which 
only one is regular. 


Je suppose qu’il y ait n combinaisons possibles, & qu’une seule 
d’elles soit réguliere. Si une cause a eu |’intention de produire 
cette combinaison, elle a eu lieu nécessairement, & sa proba- 
bilité sera 1; si, au contraire, elle a été |’effet du hasard, sa 
probabilité sera 1/n. [Condorcet 1781, p. 720] 
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Applying what Pearson [1978, p. 454] describes as “inverse probability” — 
though an argument framed in terms of odds might perhaps be more read- 
ily understood — Condorcet says that cause and chance are then in the 
ratio of 1: 1/n, and hence the chance of a cause and the chance of a chance 
are 1/(1+1/n) and (1/n)/(14+1/n) (ie. n/(n +1) and 1/(n + 1)) re- 
spectively. As we have already seen (§5.1 above), these are the values given 
by Mendelssohn, the first edition [1761] of whose Philosophische Schriften 
antedates the publication of Bayes’s Essay by three years. 

The second pertinent detail concerns sequences of regularities; specifi- 
cally, the two series 


Py Dy Oy. Be Oy Oy. Ty. 98, De AO 
1, 7, 13, 23, 44, 87, 167 


or respectively, 


On = 2an-4 SH ned. 5 THE 12,84024 10), -& given ap = Lay S25 


An = Gn—-1 + Gn—-2+ Ang tan—4, n € {4,5,...,10}, 
& given ag = 1, aj = 3, a2 = 2, agp= 1. 


These symbolic formulations are in accord with what Condorcet himself 
wrote; however, the first series could of course have been obtained in many 
different ways (e.g. ao = 1 and an = an_1 +1, 7 € {1,2,3,... ,10}), and 
any other method of obtaining it would change Condorcet’s solution. But 
we shall not worry about this point: rather let us examine how Condorcet 
continues his example. 

Keeping e terms of the first sequence and e’ of the second, one is assured 
that the probability that the law of formation of the sequence will be con- 
tinued g times is 


(e+1)/(et+q+1) and (e'+1)/(e’+¢q4+1) (1) 


respectively for the two sequences. This is essentially Pearson’s exposition 
[1978, p. 455]: the original reads as follows: 


Soit donc pour une de ces suites e le nombre des termes assu- 
jettis & une loi, et e’ le nombre correspondant pour une autre 
suite, et qu’on cherche la probabilité que pour un nombre q 
de termes suivans, la méme loi continuera d’étre observée. La 
premiére probabilité sera exprimée par (e+ 1)/(e+q+4+1), la 
seconde par (e’ + 1)/(e’ +q +1), et le rapport de la seconde a 
la premiére par (e’ + 1)(e+¢+1)/(e+ 1)(e’+¢q4+1). [p. 722] 


Although a numerical example is given, no argument is presented for the 
derivation of (1), the values in which are certainly those that would arise 
from an application of the rule of succession. Todhunter considers this 
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example in some detail in his Article 724; but in view of the arbitrariness 
of the assumptions on which it is based, there seems little point in pursuing 
the matter further. 

The third part of the memoir appeared in the volume for 1782 (published 
1785), pp. 674-691, and is entitled Sur l’évaluation des droits éventuels. 
Writing of this part, Todhunter [1865] says that it is 


neither important nor interesting, and it is disfigured by the 
contradiction and obscurity which we have noticed in Con- 
dorcet’s Essay. [art. 728] 


However, Todhunter devotes some three pages [arts 726-732] to a discus- 
sion of this trivial and tedious tractate, while Pearson found it (or at least 
parts of it) worthy of fairly detailed comment in his historical lectures 
[1978, pp. 455-457]. For us, the importance of this memoir lies in its use 
of multiple Bayes’s integrals, introduced (as we shall see in Chapter 7) by 
Laplace? in 1778. 

Condorcet begins by examining the case in which the cause (or event) 
by which the right 1s produced necessarily happens in a certain length of 
time (“as, for example, when the right accrues on every succession to the 
property” [Todhunter 1865, art. 728]), this case being followed by one in 
which the event does not necessarily happen (“as, for example, when the 
right accrues on a sale of the property, or on a particular kind of succession” 
(Todhunter loc. cit.)). Three methods are given for the first case: we shall 
discuss all three here, together with a variant presented by Todhunter. 

The first method proceeds as follows*: let a,,a2,...,@, be the num- 
ber of years elapsing between two transfers (“mutations observees” )® and 
bi, b2,... ,b, the number of transfers corresponding to those intervals. (This 
is somewhat vague: what is perhaps meant is that, starting from a contin- 
gency realized in year a, = 1, one finds that b; further contingencies become 
realized; in the second year (ay = 2), bp become realized, &c.) Further, let 
1 be the value of the right for any property whatsoever at the moment 
of its transfer, and 1/m the annual interest of right 1. The problem is to 
determine the total value of the right, as much as for the actual transfer as 
for all future transfers, this value being reported at the present time. One 
knows that the right 1 that will only be due at the end of z years will then 
be given by (m/(m + 1))’, or abbreviated, by c’. 

If we then consider p successive transfers, of which p; occur at the end 
of a, years, po at the end of ao years, ... , pn at the end of a, years, it 
is clear that, in whatever order these transfers succeed each other, the last 
will happen at the end of pya; + poa2 +---+ pnan years; so that the sum 
due for this transfer will always be 


CPi ta2P2t  tanPn | 


If, in the next place (“ensuite”), one denotes by x; the probability of 
the transfer after a, years, x2 the probability after aj years, ... and finally 
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l— 2, ~%2-—+:-—2£,_1 the probability after a, years, the probability of 
this p-th transfer that we are considering will be expressed by 
Ee eyes? (L—@,— eg — ++ ayn) ; 
pi! pal... Dn! 


so that the value of all the p-th transfers, each multiplied by its respective 
probability, will be 


le rise ape aie ee a eae) 


which represents the mean value of the right of this transfer. The total 
mean value, found by summing over p, is then given by 


1/[l— cay —c? rg —---— ce" (1 — 2&1 — 22 — +++ — yn_1)] . 


This latter result can be arrived at, as Crepel [1988a] suggests, by consid- 
ering a sequence {Y1, Y2,..., Yp} of positive random variables representing 
the “inter-arrival” times between the different transfers. Letting 


and taking expectations, we can write the total value V of the right as 


V 1+ E(3> c?*) 
k=l 


II 


1+57 E(c%). 
k 
Since the Y; are independent and identically distributed, it follows that 
E(c?*) = (EBc™1)* 


and hence 


V 


1+ 57(Ee%)' 


l} 


1/(l— Ec). 


If we now set 


then 
E(c™) =e 7g +c7%ret+---+e° (1-2, —f2—--++'—2£n-1), 


and Condorcet’s result obtains. 
Noting that here the x; are neither given nor constant, Condorcet goes 
on to say that one knows only that the event whose probability is expressed 
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by x, has happened 6, times, that whose probability is expressed by x2, bo 
times, &c. The mean value of the right for the p-th transfer will then be 
expressed by 


n—-1 
/ {a} ot x5? 2 y' fe ay + e829 +--- +8 yl]? dry dre... darn} 
by 2 ae 
7, Vepee ... yon dry... drys} 
where y = (l— 2, —---—@n-1), the integration being repeated n—1 times 
and the integrals® being taken from z,_1 = 0 to ¢n-1 = 1l—2,----- 
Yn—2, from Lpn-2 =O to gan_g = 1-21 —-:-—2y_3,..., from x; = 0 to 


x, = 1. The (n — 1)-fold integral in the denominator here is seen to be a 
Dirichlet integral (see Whittaker and Watson [1973, §12.5]), its value being 


TIE + » [r+ 


To evaluate the integral in the numerator, let us firstly set c°7 = c;: then 
we expand the term in crotchets as 


[erty + cote +-++ + Cp—1%n-1 + en (1 — 21 — tg — +++ — Ey_1)]? 
fe . 
ty 24 Licey in 
— -. (, ’ , )IIe gy... ,0](1 — 1 - +++ fn-1) 3 
VIED sine rbn fat 


where each 7; is a non-negative integer with . i; = p and where > de- 

cae (i) 
notes the sum over all possible values of {71,72,...,in}. The integral under 
discussion then becomes 


eee ae 2 ae tyretin des. .dtn—1 , 
(i) 


where y = (1 — 2, —---— @p_1) and 


p n 

Ja ty 

K=(, Te. 
41,22,---,tn j=l 


Evaluation of the Dirichlet integral results in 


Il P(b; +7; + 1) 


~(, 7.) Ue 


(i) j=l PCS, Cb; +i;) +7) 
ie" 
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The mean value of the right for the p-th transfer is then 


n I] Tj; +441) PEQtd; +n) 
(,?,)fg eee eee 
ee NE esta) Pee) 
j= j= 
We now come to Condorcet’s second method. In the same notation as 
that used before, he supposes that x is the probability of the transfer at 
the end of the first year. Then (1 — x)zx is the probability of the transfer 
at the end of the second year, (1 — xz)*z the probability at the end of the 
third, and so on. The value of the right of the first transfer is then 


ce +(1—a)c?2+(t—2)*c8x+---, 


the sum being 
cx/(1—c+ cr). 


The second, third etc. transfers result in the values 
[ex/(l1—e+ez)]?, [ca#/(1-—c+czx)]® , etc. 
Then 


Ajoutant donc a ces termes 1, valeur de la mutation que l’on 
suppose avoir lieu, & étre die a l’instant ot |’on cherche a 
évaleur le droit [p. 679], 


one obtains, for the sum, 
L+er/(l1—e+czx)+4 [cex/(1—¢+ca)]* + [cex/(l-—c+ex)]?+--- 


(l—c+ecxr)/(1—c). 


Condorcet then declares that “la valeur totale du droit” is given by 
1 
f(l—2)* 28 [((1-c+ecr)/(1-o)| dz 
0 . 
1 
f(l—2)% 28 dex 
0 
(I have changed his notation, writing a and @ for 
\ (a = 1)b; and Sb; 
1 1 


respectively.) This Bayes-type integral is easily seen to reduce to 


ec (8 +)) 
l—c(a+642) ’ 


1+ 


6.3 The Memoir 129 


or, in Condorcet’s notation, to 


c (b1 +b2 +---+bn +1) 


1+ — —— 
T= @ (aid; habs bende 2) 


By now it will probably be quite clear that Todhunter has not erred 
in frequently drawing his readers’ attention to Condorcet’s obscure and 
often obnubilated oratory. We choose at this stage, therefore, to present an 
alternative approach to Condorcet’s second method, following Todhunter 
[1865, arts 729-730] and Pearson [1978, p. 457]. 

Suppose, then, that the right is equally likely to occur in any year (e.g. 
change by sale, rather than death of present holder). If c is the present 
value of the fee to be paid in the event of the right being realized, the value 
of the whole right is 


e(e+e?+---)=2c/(1—c). 


If during the past m+n years the event happened m times and failed to 
happen n times, one might well estimate x by m/(m-+n), in which case the 
whole value of the right becomes [ce/(1 — c)] [m/(m+n)]. Since, however, 
Condorcet views x as unknown, the whole value of the right must rather 


be taken as 
1 
[ ea-ey ac(1—c)~ tae | fam (1 — 2x)" dz 
0 


—_ ¢ Bi(m+2,n+1) 
2S ieee B(m+1,n+1) 


_ ¢ m+1 
eae geen 
a result that differs from the preceding estimate by the replacement of m 
and n by m+ 1 and n+ 1 respectively (a substitution of little moment if 
m is large). 
In his Article 730, Todhunter criticizes this second method on two ac- 
counts, viz. 


(1) Condorcet asserts that this method is applicable to his first case, that 
is, one in which the event must happen in a given number of years. In 
an example such as he mentions, namely one where the right would 
accrue on the death of the present holder of the property, the method 
is clearly inapplicable, since the probability of the event concerned 
may well vary from year to year. This method would, however, be 
applicable in the second case — i.e. when the right is supposed to 
accrue from a sale (as we have in fact supposed in our discussion 
of this method), the probability of which latter event might well be 
supposed to be constant from year to year. 
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(ii) The use of Bayes’s theorem here adds very little to our knowledge 
when m-+ n is large; and when it is small, “our knowledge of the past 
would be insufficient to justify any confidence in our anticipations of 


the future.” [Todhunter 1865, art. 730] 


Finishing off his discussion of the second method detailed above, Pearson 
[1978, p. 457] writes “Todhunter not unjustly calls Condorcet’s method ‘an 
extravagant extension and abuse of Bayes’ Theorem’ ” (an opinion with 
which Crepel [1988a, p. 299] differs sharply). In writing this the worthy 
biometrician has erred: the quotation from Todhunter is in fact a reference 
to a later part of the memoir in which the total value arising from two 
different rights is investigated. 

Finally let us have a quick look at Condorcet’s third method. Here it is 
supposed that 


nous appellerons z,,29,...,1— 21 — 22 — ++: — Zn-1, OU Zn, 
les probabilités que l’évenement pour la succession duquel on 
cherche la valeur du droit, sera dans la liste des évenemens 
dont la mutation est arrivee au bout de aj,a9,...,a, années, 
& 21, 22,23,...,2£n les probabilités inégales pour les mutations 
correspondantes 4 chaque intervalle [p. 681] 


(notation altered). Two cases may then be considered: 


ou que dans la suite des évenemens celui qu’on considére appar- 
tiendra toujours au méme z;, ou peut appartenir successivement 
a tous. [p. 681] 


Under the first of these assumptions, the mean value of the right is 


1l— l—c+cz l—c+cz 
Ft yy eee ee 
l—e l—c 1l—c 


and consequently “la formule qui représente le droit” (loc. cit.) is the ratio 
of the (n — 1)-fold integral 


ov [eR name mena GEG Ga) di eden 
to the (n — 1)-fold integral 
[fe re ae — Zp Zn—1)°" dz,...dzn—1, 


where my ¢; represent the following expressions used by Condorcet: 


GS 2; ethan] 
e _ (1 Peel eee At fsa JVA 
bee | 
ae = aah eee 


l1—c a,b; +2” 
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Both of these integrals being of the Dirichlet type, one finds relatively easily 
that their ratio is 


y1(b1 + 1) + yo(b2 + 1) +--+ Yn(bn +1) 
by 1) 
rl 


Under the second of the two assumptions mentioned in the last quotation, 
the mean value of the right will be 


SE ES LO nH F § SS 


=i 
| C2424 C2922 C2n kn 


while “la valeur moyenne de cette formule pour toutes les valeurs de x” (loc. 
cit.) and with Z denoting “cette valeur” (presumably that given above) is 
given by the ratio’ of the ((n — 1)-fold) integral 


bn 
[ofa sana — += Zy3)"Z dz ...d%p—1 


to 
bane 
foo [ae am ence dan eden. 


Following his development of a multiple Bayes’s integral, Condorcet re- 
marks (“somewhat naively”, according to Pearson [1978, p. 457]) 


Nous ne dirons rien de plus de ces formules, si n’est qu’elles 
s’integrent par les méthodes connues, & que d’ailleurs on eut 
auroit des valeurs tres-approchées, soit par la méthode donnée 
par M. Euler, soit par celles que M. de la Place a exposées dans 
ce méme volume. [p. 682] 


The fourth part of the memoir, published in 1786 (i.e. after the Essai) 
in the volume for 1783, is entitled Réflerions sur la méthode de déterminer 
la probabihtté des évéenemens futurs, d’apres l’observation des événemens 
passés, and occupies pp. 539-553. The purpose of the work is summarized 
succinctly in the opening words as follows: 


Cette partie de l’Analyse qui enseigne a déterminer la prob- 
abilité des évenemens futurs, d’apres l’ordre qu’ont suive les 
évenemens passés du méme genre que |’on a observés, est sus- 
ceptible d’un grand nombre d’applications utiles & curieuses; 
j’al cru en conséquence qu’il pourroit n’étre pas inutile d’exami- 
ner les principes sur lesquels cette Analyse est fondée; tel est 
Vobjet des Réflexions suivantes. [p. 539] 


Despite the fact that Condorcet was a personal friend of Price’s’, there is 
mention neither of the latter nor of Bayes!®. Writing in the 1920’s Pearson 
[1978] says 
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It is simply the French custom!!, which never cites authorities, 
so that it is impossible to say of a French work or memoir how 
much or how little is original. Of course it is a very bad custom, 
which has lasted from 1700 to the present day in France. 

[p. 457] 


We shall however find later mention of Bayes in Condorcet’s work. 

Condorcet begins by supposing that there are only two events A and 
N, of a nature that we should today describe as “mutually exclusive and 
only possible”, and that these two events have occurred m and n times 
respectively. The probability, then, of having, in p+ trials!?, p events (or 
occurrences of the event) A and q events N, will be 


—_ pigs) f otra aytde / fo” (l—2z)"dx (2) 


“telle est la regle générale” [Condorcet 1783, p. 539]. (This rule also occurs 
in the Essai — cf. Todhunter [1865, art. 704] and Dinges [1983, p. 74]: we 
shall take up this point later on.!9) 

It is, I believe, important to consider Pearson’s [1978] comments on this 
formula: he says 


This is the generalised Bayes’ Theorem; it is the generalisation 
which is due to Condorcet. Bayes took p = 1 and q = 0. But 
Bayes is more correct than Condorcet, for he shows why he 
puts the ‘dz’ in on his hypothesis of first ball determining the 
chance of success or failure. Condorcet does not explain where 
the dx comes from. I think it can only be explained by the 
Euler-Maclaurin bridge and in this case, we must suppose the 
differential coefficients finite at the terminals. The point 1s, I 
think, an important one, because Condorcet starts from ball 
drawing in urns, and thus his is really the ratio of two numbers 
and not continuous unless the total number of balls in the urn 
be infinite. x would go by stages, and it may be just possible 
that for small m,n, p and gq the terminal conditions do become 
of some importance. [p. 458] 


Now one must bear in mind that Pearson’s History is composed of a 
series of lectures and was not designed by him for publication?*. It is quite 
possible, therefore, that any criticism one may level against this work might 
well have been removed had his intentions been otherwise. Nevertheless, it 
is, I feel, necessary to comment briefly on this passage. 


(i) The first, and perhaps the most important, remark is that Bayes’s 
result is not that given here with p = 1 and gq = 0. We have already 
hinted (and shall say more on the matter in the chapter on Laplace) 
that there is no reference to (the occurrence of) any future event in 
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Bayes’s Essay per se (although such an extension is of course made 
by Price). 


(ii) The “generalization” , if such we may call it, is not due to Condorcet: it 
was in fact. given by Laplace in 1774 in his!” Mémoire sur la probabilité 
des causes par les évenements. 


(iii) As minor comments, we might mention two points: firstly, there is in 
fact no “dx” in Bayes’s work (he did not use integral notation). Sec- 
ondly, I have not managed to find, either in Todhunter’s discussion or 
in the pertinent part of the original, any reference to the drawing of 
balls from an urn. Such reference is however made in Laplace’s mem- 
oir cited in (i1) above, and we shall return to this in the appropriate 
chapter. 


After presenting (and I use this word purposely, for no further argument 
is given) this formula, Condorcet points out that it really expresses the 
probability only in the case of the following two hypotheses: 


1. Si la probabilité des évenemens A & N reste la méme dans 
toute la suite des évenemens; cela est évident par la formule 
méme qui exprime la lot. 

2. Dans le cas ot cette méme probabilité est variable, mais 
ou |’on supposeroit en méme temps que la valeur de la proba- 
bilité, quoique pouvant étre différente pour chaque évenement, 
est cependant prise au hasard pour chacun, d’aprés une certaine 
probabilité générale x pour A, & 1— 2 pour N. [p. 540] 


After some discussion of these hypotheses, Condorcet gives his definition 
of “probability” as 


n’est que le rapport du nombre des combinaisons qui amenent 
un évenement a celui des combinaisons qui ne |’amenent pas; 
combinaisons que notre ignorance nous fait regarder comme 
également possibles [p. 540] 


and then relates this definition to the two hypotheses. He stresses that for 
any other hypothesis the formula cited should not be regarded as giving 
accurate results: in such a case a procedure advocated in the Essai [p. 179] 
may be adopted. 

Having noted that the same formula holds for any ordering of the m A’s 
and n N’s, Condorcet points out that when the assumption that x is con- 
stant contradicts that which reason indicates, one ought perhaps to use 
some method in which the probability depends on the order of the events: 
two cases involving variable x are briefly considered in the fourth article. 

In Article 5 Condorcet considers the case in which the probability may 
differ from one event to another, although it is independent of the order in 
which the events occur. Let t denote the total number of events, past or 
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future, let £; = m-+n be the number of past events, and let tg = p+q be the 
number of future events. Denoting by 41, @9,..., 2; the different probabil- 
ities in favour of A, he gives, for the probability of p events A and qg events 
N in tg future events (instead of the earlier formula) the expression’® 


we | m-+p = n+q 


q! [fore [1 — s;,/t]" dx,... dx; 


where s, = ee x; and each integral is taken from 0 to 1. An expression 
is also given for the probability that, in an unlimited sequence of events, 
more A’s than B’s will occur. 

To evaluate the ratio in (3), notice first that 


de, Se [fet + eos des... 


= af re» H(4,. pet eat des der, 


where )~ denotes the sum over all sequences {ki,... , k,} of non-negative 
(k) 
integers with ae k; = m. Thus 


Lim = aL,” ,,) e+" 


(k) 


(3) 


- Cate moe, Lm + DIT es +0) 


(k) i=l 


Alternatively, writing the multinomial coefficient as a product of binomial 
coefficients, we have 


m 


—ky, 
1 m+t\" neki eta 


“yO (" ae ky ttt hi +t _ (t al "7 
ky-14+1 | 


ki—1220 


a form that is perhaps easier for computation. 
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Similarly, 


Legnn = [ | [ona Sedan ste 


n N-jJo 4Jo-M—Nn 1 
-Er,.” fees: ~~ WW Geer 


(k) Gj) 


ee err nae 


>> ( mtn+t ie a 
kytjit1,...,he +f: +1 t ee k; 


(k) @) a 


Thus, for example, if we take t = m+n+p+q=2+041+4+0=3, 


m m—k 
1 m+3 *(m—k, +2 
t;m,0 t; t™(m+t)s = ky +1 2 ko +1 


and hence 


Is /Is3 9/05 


as given by Condorcet: that is, given two occurrences of A (and none of 
N), the chance of a further occurrence of A is 3/5. 

In the sixth article Condorcet supposes the probability to be variable, 
but possibly dependent on the order of events. He is once again rather 
confusing, and I shall therefore quote the original: 


soit x’ la probabilité du premier A, & 1 — 2’ celle du premier 
N; (a' +2") /2 & (2—2' — 2") /2 pourront exprimer les prob- 
abilités du second A ou du second N, (2/7 + 2" 4+ 2/")/3 & 
(3 — 2’ — xe’ — 2”) /3 celles du troisieme A ou du troisieme N; 
& celles des r°* A ou N pourront l’étre par (2’ + 2” 4+ a!” + 

bal) Ip & (r— av! — 2" — 2! —.--~- 2") /r, ot Von voit 
que z’ est la probabilité de A au premier coup, x” celle de A 
au second si elle est différente de celle du premier, x’” celle de 
A au troisiéme si elle est différente de celle des deux autres, & 
ainsi de suite. [p. 545] 


Noting the difficulties that can arise when future occurrences are to be 
taken into account, Condorcet restricts his subsequent attention to the case 
in which future events occur in the same order as that which has already 
been observed (event FE, say). If we let n be the number of constantly 
occurring events, and p the number of future events, the probability that 


136 6 Condorcet 


that event (£) will occur (or that that law will be observed during the time 
of p revolutions) will be expressed by 


Jf 81(s2/2).--(sntp (n+ p)) der... dentp 
fi fs1(s2/2)...(8n /n) day... dan ) 


Todhunter, writing of formula (2) above, says 


(4) 


Condorcet quotes this result; he thinks however that better for- 
mulae may be given, and he proposes two. But these seem quite 
arbitrary, and we do not perceive any reason for preferring them 
to the usual formula. [1865, art. 734] 


However, as Pearson [1978, p. 459] has noted, Condorcet is in fact consid- 
ering three distinct problems, formulated in the seventh section of this part 
of the memoir as follows: 


1°. celle ot la probabilité est constante, c’est-a-dire, ot l’on 
suppose chaque événement également probable, ou du moins 
la probabilité moyenne pour chacun, déterminée d’une maniere 
semblable; 2°. celle ot! l’on suppose cette probabilité variable, 
mais indépendante du temps ou les évenemens sont arrivés, & 
de l’ordre dans lequel ils ont été observés; 3°. celle ot on les 
suppose dépendans, ou plutot pouvant dépendre de cet ordre. 
[pp. 548-549] 


The solutions to these problems are those respectively given by formulae 
(2) — (4) above. In his comments on this section, Pearson writes 


In (1) Condorcet agrees with and generalises Bayes. This is an 
advance, but no more than Bayes has he any hesitation about 
the equal distribution of ignorance. In (ii) he takes a mean value 
of all the unknown chances and integrates with regard to each 
of them. If he had integrated solely with regard to the mean 
chance he would have really fallen back on Bayes. I think to 
be accurate he ought to have recorded the success or failure at 
each trial and integrated the resulting products, and this would 
give the answer in the same manner as Bayes. If this be done 
it seems to me that we should get precisely the same result for 
(ii) and (iii) unless in (111) we make some hypothesis as to the 
correlation between successive x’s. [1978, p. 459] 


Let us now examine this quotation: 


(a) We have already commented on the claim that (2) is a generalization 
of Bayes’s result. Further, Pearson is perhaps a little too ready to say 
that both Condorcet and Bayes had no hesitation in using the “equal 
distribution of ignorance” assumption. We have previously discussed 
Bayes’s argument for this prior postulate, an argument that one must 
agree is singularly lacking in Condorcet’s work. 
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(b) As regards the sentence starting “if he had integrated solely ... ” 
this is clearly true. 


c) In the sentence starting “I think to be accurate ... ”, is Pearson sug- 
& & 
gesting merely the integration, in the usual manner, of some product 


Lape ay 
if 


(d) One must agree with Pearson as regards the hypothesis of correlation; 
and the hypothesis that Condorcet has in fact chosen is, like those on 
which other formulae presented in the memoir and of similar type to 
those already mentioned, are based, rather arbitrary. 


As a final example from this part of the memoir we instance that pre- 
sented in the ninth section. Here Condorcet supposes that two sequences S 
and S’ of events A and N have been observed, with A and N occurring m 
and n times respectively in S, and m’ and n’ times respectively in S’. In 
addition it is supposed that the ratio m : n differs sufficiently from the ratio 
m' : n' that one may assume that the probability of A is not the same in 
the two sequences. It is required to find the probability of getting p A’s and 
q N’s in (p+q) future events. Letting x and 1—2 =z (2' and 1-2’ = 2’) 
be the probabilities of A and N respectively in the first (second) sequence, 
Condorcet defines X and X’ by X = x™(1—2)" and X! = (2')™ (1—2')" 
He then considers in (x + z+ 2'+z’)?T4 the sequence of all terms in which 
the sum of the exponents of x and z’ is p, and that of z and z‘ is g. On our 
letting A 12(x’)>z* (2')” be one of these terms, the resultant probability is 


found to be 
A. |X ate ‘de. [ X'(2") (z!)® da’ 


[xae. f x'ae 


the required probability being the sum of all the terms thus formed, pro- 
vided that it 1s equally probable that a future event belongs to either S or 
roe 

If, contrariwise, one supposes that this same probability depends on the 
order observed in the two sequences, then the term given must be multiplied 


the required wae on found by summing all such terms and divid- 
ing by 


[a dxe+ X'dz')Pt4 | 


Finally one may suppose this probability ordered in accordance with the 
number of terms of each sequence, in which case the same term must be 
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multiphed by 


et eye aan 


taking the sum of all such terms and dividing by 
forra bas go)ttn dx» A 


In the fifth part of his memoir, Sur la probabilité des fatts extraordi- 
naires, published on pp. 553-559 of the same volume of the Histoire de 
l’Académie as the fourth, Condorcet devotes no little attention to the ques- 
tion of testimony!’. In doing so, he presents in the second section the fol- 
lowing argument: 


Supposons que wu désigne la probabilité d’un évenement A, & 
e celle d’un évéenement N, que u’ & e’ désignent les proba- 
bilités de deux autres évenemens A’ & N’; uu’ /(uu’ + ee’) ex- 
primera la probabilité de la combinaison des événemens A, A’; 
& ee’ /(uu! + ee’) la probabilité de celle des évenemens N, N’. 
[p. 554] 


An example involving the drawing of coins from an urn follows, and this 
in turn is followed by a testimonial example, in which the use of the discrete 
Bayes’s Theorem is perhaps more clearly expressed. The relevant passage 
runs as follows: 


Supposons maintenant que u & e représentent les probabilités 
de la vérité d’un évenement extraordinaire & dela fausseté du 
méme événement, & qu’en méme-temps u’ & e’ expriment la 
probabilité qu’un témoignage sera ou non conforme a la verité, 
& qu’un témoin ait assuré de la vérité de cet évenement. ... 
ainsi la probabilité que l’évenement extraordinaire déclaré vrai 
Vest réellement, sera uu’/(uu’+ ee’), & celle qu'il est faux 


ee’ /(uu' + ee’). [pp. 554-555] 


If we let F denote the truth of the extraordinary event, and £* the con- 
forming of the testimony to the truth of Z, then u= Pr{E] =1—e,u' = 
Pr[z* | EB] and e’ = Pr[{E* | Z]. Thus 


uu! = Pr[E]} Pr[B*|£] - 
uu't+ee! ~~ Pr [EF] Pr[E*|£]+Pr[£] Pr[z*|£] 
=: Pri |, 


that is, the probability that an event declared to be true is really so!®. 
While much of the rest of this part of the memoir is devoted to am- 
plification of the above formula, the main use of it is made in the sixth 
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part, Application des principes de l’article précédent a quelques questions 
de critique, published in the Histozre de l’Académie for 1784, pp. 454-468. 
It seems unnecessary to rehearse these applications here?’. 

We have had occasion, in the course of this section, frequently to com- 
ment on the “obscurity and inutility” [Todhunter 1865, art. 753] in Con- 
dorcet’s writing. Others’ comments on this score are reported in Todhunter, 
Article 753: the last sentence of this article is well-worth repeating: 


Condorcet seems really to have fancied that valuable results 
could be obtained from any data, however imperfect, by using 
formulae with an adequate supply of signs of integration. 


Gouraud’s opinion of the memoir is more glowing?°. Speaking of the 
first four parts, in preparation for the writing of which Condorcet had 
spent three years in familiarizing himself with the calculus, in studying the 
general rules and methods and the principal kinds of application, Gouraud 
[1848] says that these researches 


produisirent de 1781 a 1783 les quatre premieres parties d’un 
vaste et beau mémoire oti |’ingénieux géometre déposa les résul- 
tats de longues réflexions sur tout le passé de la théorie des 
hasards, résultats précieux, dont la découverte faisait également 
honneur au philosophe et a |’analyste. [p. 91] 


A similar comment is made (op. cit.) on the last two parts of the memoir, 
viz. 


A la fin de 1783 et dans le courant de 1784, il montra dans 
une cinquieme et derniere partie du mémoire qui l|’occupait 
déja depuis trois ans, que ces premiers travaux n’étaient que 
les préliminaires d’une publication plus originale et plus hardie. 


[p. 92] 


6.4 Probabihté, from the Encyclopédie 
Méthodique 


The mathematical part of the Encyclopédie Méthodique, ou par ordre de 
matiéres was published in three volumes in 1784, 1785 and 1789, the second 
of these having two articles entitled “Probabilité”. The first of these articles, 
pp. 640-649, is a reprint of the article under the same title from the earlier 
Encyclopédie ou Dictionnaire Raisonné; it is apparently by Diderot?!, and 
contains nothing useful to our purpose. The second article, pp. 649-663, 
is unsigned, but the last sentence makes it clear that the author was Con- 
dorcet. Devoted to general principles of the calculus of probabilities, the 
article is divided into three parts, only the third of which concerns us here. 


140 6 Condorcet 


Condorcet’s aim in this third section is stated at the outset as follows: 


Jusqu’ici nous avons regardé le nombre des combinaisons qui 
donnent chaque événement comme déterminé & connu. Nous 
allons maintenant supposer ce nombre inconnu & variable, en 
sorte qu’iln’y ait plus une probabilité déterminée des événemens, 
mais seulement une probabilité moyenne d’aprés laquelle on 
puisse déterminer celle de leur production. [p. 657] 


In the second article of this section he supposes that from an urn con- 
taining black balls and white, n white and m black balls have been drawn. 
What will then be the probability of drawing p white and q black balls? 
Under the further assumption that the urn contains an infinite number 
of balls, “afin que le rapport des boules blanches, au nombre total, puisse 
avoir toutes les valeurs depuis 1 jusqu’a 0” [p. 657], Condorcet finds the 
required probability to be 


1 1 
ce | a tP(1 —2)™t49 dz /| z"(1—2)” dz 
eee 0 


= mtn+1l aa i) (aseeaa) 
 m+qtn+p+i1\ P n n+p 


Supposing next that n > m, Condorcet asks what the probability will be 
that in the sequence of events the number of white balls will exceed that of 
black by a given amount. Three conclusions about this probability present 
themselves, viz. 


1°. que cette probabilité ne peut jamais approcher indéfiniment 
de 1; 2°. que, suivant les hypothéses de pluralité, elle peut, 
apres avoir été croissante, devenir décroissante; 3°. qu’aprés un 
certain terme, elle continuera indéfiniment d’approcher de la 
fonction 


ee ee ae ES OG Re ee CR Meee 

ae re ae 1 

fe .t= dx jf et=2 dz > 5? 
1 


la formule f z”.1—<a dx” indiquant que |’intégrale est prise 
seulement depuis x = 1, jusqu’a z = 1/2. [p. 657] 


The following is an attempt at an explanation of the above passage. 

Let W and B denote the numbers of white and black balls in the se- 
quence, with W+ B= N. Then W > B => W/N > 1/2. Moreover, if 
W=B+6, with 6>0, then W/N = (1/2) + 6/2N, and hence 


Pr[W > B+ 6] = Pr[W/N > (1/2) + 6/2N] . 
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Clearly this probability increases with increasing N and decreases with 
increasing 6, provided that the ratio W/N is unchanged. Furthermore, if 
6 is fixed, this probability will decrease as N — oo — Le. the probability 
does not tend to 1. Finally, note that 


J = Pr[W/N > (1/2)] =f 20 — x)” ie /f 2c — xr)” dx 


1/2 1 
— See ee TS my a ; 
/ B(m+1,n+1) lee at (5) 


If, as stated at the outset, n > m, then m/(m-+n) < 1/2. Recognizing that 


m/(m +n) is the mode of the beta density in (5), we find that 
n>m = mode < ; 


It thus follows from (5) that J > 5 

Condorcet next considers the case in which n < m (though this is mis- 
takenly printed in the original as m < n), and concludes that in this case 
J < 1/2. Similarly it follows that, in an infinite number of future draws, 


[ u-ay ic | f 20 _2)™ dz 


1~a 
1 
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(where a = p/(p + q)), a probability that exceeds, or is less than, 1/2 
according as the mode m/(m +n) is less than, or greater than, ¢/(p+ q). 
Finally (at least in this subsection), it is shown that, for p’ > p, 


Pr[W/N > p/(p+4)] 


{| 


Prla< W/N < 6] = [g(m+1,n+1) — La(m4+i1,n41) 


where a = p/(p+q) and # = p'/(p + 4). 

Condorcet next addresses himself to considering “s’il n’est question que 
d’une pluralité absolue ou proportionelle, observée entre les événemens” 
[p. 658] what the probability of indefinite continuation of this plurality 
may be. The answer in the case of absolute plurality is given as 


1 1 
/ £°t?(1 — 2)" dz /| 2?th(1 — 2)? dz , 
1/2 0 


while for proportional plurality we have 


[ z°*(1 — x)* dx ak z°°(1— 2x)* dx 
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with y = c/(1+c¢). No argument for these solutions is presented: Condorcet 
is apparently assuming, in these two cases, that (in our previous notation), 
W = B+b6 and W = cB, with B = a in each case. He also derives an 
expression, in the case of proportional plurality, for the probability that 
W/W lies between two given functions of c. 

In the next subsection Condorcet applies the preceding theory to the 
question of births, showing that “tout restant dans le méme état” the prob- 
ability that in an indefinite period there will be more boys born than girls 


is 
1 1 

i] ett] — 2)* dx /| ett] — 2)? dz 
1/2 0 


where a+ 6 is the number of boys and a is the number of girls. Further 
applications follow to problems of life annuities and contingent rights. 
Recalling in the tenth subsection that the probability has hitherto been 
regarded as constant in a sequence of events of the same type, Condorcet 
notes that this assumption may in some cases appear gratuitous. He sup- 
poses now that the events are independent of one another, keeping the same 
probability. In the notation introduced earlier, the probability of obtaining 
the event A, after A and N have been observed n and m times respec- 
tively, is (n+ 1)/(n +m-4+ 2). But if the events are independent, this same 


probability will be 
1 
/ fades ; . 
0 


Further, the probability of n A’s and m N’s is 


1 
(rrr) | e"(1—2)" dz 
m 
under the first hypothesis and 


aang) ([ eae) (fo : 2) dz 


under the second. These two probabilities are then in the ratio 


a) ™m 


m! n! ent 
(m+n+4+1)! Qa+m° 


and consequently “la probabilité moyenne A” will be 


(m+ 1)!n! i m! n! 2 | 
(m+n+2)! 0 Qntm+i (m+n+1)! 9 Qntm 
An application to (n + m) tosses of a coin is then given (see Problem III 


of the Essai for further detail). 
Condorcet now focusses his attention on the first hypothesis used above, 


finding that it is legitimate in only two cases: 
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1°. lorsque la probabilité de chaque événement est toujours la 
méme, comme lorsqu’on tire des boules noires ou blanches tou- 
jours d’une méme urne; 2°. lorsque les tirant d’urnes différentes, 
on suppose que ces urnes ont été remplies en prenant des boulles 
dans une masse commune, ou elles étoient dans un certain rap- 
port. [p. 660] 


In the first case he asserts that it 1s the probability itself that is constant, 
while in the second it is the mean probability?*. An application to the 
drawing of cards from packs follows. 

A further modification is made in the twelfth subsection, where the fol- 
lowing assertion is made: 


On doit donc en général, & si l'on n’a pas @ priorz quelque 
raison d’adopter une autre hypotheése, regarder la probabilté 
non-seulement comme dépendante des évenemens, mais aussi 
comme dépendante de l’ordre qu’ils suivent entr’eux. [p. 661] 


The probability of successive occurrences of events of types A and B are 
then given respectively by the two sequences 


es. (ee )/25: (ota ee) (3 si0: 
and 
(ia), (Q-2)+0-2)/2, (1-2) 40-2) +0 -29Y/8,... 


The probability of a specified sequence of future events is then a fraction 
whose numerator is the repeated integral of the products of the probabilities 
of the events already observed and those expected, and whose denominator 
is the repeated integral of the products of the probabilities of the observed 
events: all integrals are taken over the unit interval. Further ramifications 
of typical Condorcetian character follow. Many of the results of this article 
are given in more detail in the Essai, and we shall consider them in due 
course. The article concludes with the following historical observations: 


La théorie exposée dans ce troisieme article est encore peu con- 
nue. MM. Price & Bayes en ont donné les principes fondamen- 
taux dans les Transactions philosophiques des années 1764 & 
1765. M. Delaplace l’a traitée le premier analytiquement, & 
en a fait plusiers savantes applications dans les Mémoires de 
Vacadémie des sciences. On trouvera aussi quelques réflexions 
sur le méme sujet dans l’ouvrage que j’ai publie sur le proba- 
biltté des décisions, & dans quelques mémoires insérés dans les 
volumes de l’académie, années 1781, 1782 & 1783. [p. 663] 


It is this last sentence, as we mentioned at the outset, that identifies Con- 
dorcet as the author of this article. 
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6.5 The Essay 


The work entitled Essat sur l’application de l’analyse a la probabilité des 
décistons rendues a la pluralité des voix was published in Paris in 1785. 
Like so much of Condorcet’s work, this essay is fraught with difficulty. 
Todhunter is particularly severe on Condorcet in respect of this work?3: in 
his 1865 history he writes 


the difficulty does not lie in the mathematical investigations, 
but in the expressions which are employed to introduce these 
investigations and to state their results: it is in many cases al- 
most impossible to discover what Condorcet means to say. The 
obscurity and self contradiction are without any parallel, so far 
as our experience of mathematical works extends; some exam- 
ples will be given in the course of our analysis, but no amount 
of examples can convey an adequate impression of the extent of 
the evils. We believe that the work has been very little studied, 
for we have not observed any recognition of the repulsive pecu- 
liarities by which it is so undesirably distinguished. 

[art. 660] 


Gouraud’s praise, on the other hand, is as fulsome as usual; he writes 


cette remarquable composition, le traité de la plus longue haleine 
et du plus ambitieux dessein qui jusque-la, dans les cent cin- 
quante ans d’existence de la théorie des hasards, ett attiré 
attention publique, par la nature des materiéres que l’auter 
entreprend d’y soummettre au calcul, l’adresse des hypothéses 
auxquelles il se livre dans cet objet, la nouveauté des méthodes 
analytiques dont il faut usage, les vues immenses qu’il découvre 
a la géométrie, et, par-dessus tout cela, la sécurité sans égale 
avec laquelle il travaille a la conquéte de la terre vierge encore 
ou il aborde le premier, restera dans l’histoire de l’intelligence 
de ’ homme comme un des plus naifs et des plus éclatants 
témoignages de l’insatiable avidité de ses désirs et de ses 
espérances. [1848, pp. 94-95] 


Even when criticizing Condorcet Gouraud is incapable of suppressing his 
favourable views. Further on in the same work we find the following: 


Un style embarrassé, dénué de justesse et de coloris, une philoso- 
phie souvent obscure ou bizarre, une analyse que les meilleurs 
juges ont trouvée confuse, tels sont, sans préjuger d’ailleurs la 
légitimité de l'innovation de Condorcet, les défauts de l’ouvrage 
ou il en a consigné les principes: des idées ingénieuses et neuves, 
des méthodes originales, quelques traits d’une véritable 
éloquence, en font le mérite et les beautés. [p. 99] 
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The essay consists basically of two parts: a Discours Préliminatre of cxci 
pages, and the E’ssaz proper of 304 pages. We shall discuss these serzatim. 
Opinions on the usefulness of the preliminary discussion vary. Todhunter 


[1865, art. 661] writes 


We shall not delay on the Preliminary Discourse, because it is 
little more than a statement of the results obtained in the Essay. 
The Preliminary Discourse is in fact superfluous to any person 
who is sufficiently acquainted with Mathematics to study the 
Essay, and it would be scarcely intelligible to any other person. 


Pearson, on the other hand, in writing of Condorcet’s mathematical treat- 
ment, says “much light on these matters can be obtained from the pre- 
liminary discourse” [1978, p. 469]. We shall content ourselves here with 
discussing only those parts of the proem that are particularly pertinent to 
our present topic. 

The Essai being divided into five parts (plus a short introduction), the 
preliminary discourse is similarly partitioned. The aim of this discourse is 
clearly stated: 


ainsi j’ai cru devoir y joindre un Discours, ou, apres avoir exposé 
les principes fondamentaux du Calcul des probabilités, je me 
propose de développer les principales questions que j’ai essayé 
de résoudre & les résultats auxquels le calcul m’a conduit. Les 
Lecteurs qui ne sont pas Géoméetres, n’auront besoin, pour juger 
de l’ouvrage, que d’admettre comme vrai ce qui est donné pour 
prouvé par le calcul. [p. ij] 


Basic to his theory is the following general principle?*: 


si sur un nombre donné de combinaisons également possibles, 
il y en a un certain nombre qui donnent un évenement, & un 
autre nombre qui donnent |’évenement contraire, la probabilité 
de chacun des deux événemens sera égale au nombre des com- 
binaisons qui l’aménent, divisé par le nombre total. [p. v] 


A similar sentiment is expressed on p. |xxxvj. 
Condorcet next gives various results that we can express as follows: 


(i) for any event A, Pr[A]+ Pr[A] = 1; 
(ii) if S denotes the certain event, Pr [S] = 1; 
(iii) Pr[AU A] = 1; 
(iv) probability is expressed by a (proper) fraction, certitude by 1. 


He also considers the case in which the combinations are not equally pos- 
sible: if one combination 1s twice as possible as another, the former should 
be viewed as two similar equipossible combinations. 
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Condorcet goes on to say that one should not regard the above principle 
as limiting the definition of the probability of an event to an appropriate 
ratio of numbers of combinations. Rather, he believes it should include 
belief in the following sense?*: 


(i) if one knows the number of combinations that occasion an event, 
and the number that do not occasion it, and if the former exceeds 
the latter, then there is reason to believe that the event will happen 
rather than that it will not happen; 


(ii) this reason for belief increases as the ratio of the number of favourable 
combinations to the total number increases; and finally 


(111) that it increases proportionally in the same ratio. 


He cites as a source of the proof of the last two statements Bernoulli’s 
Ars Conjectandi?®: both of them are, he states, consequences of the first, 
the latter being proved in the following way: however small the excess of 
the probability of one event may be over that of another, in a sequence 
of similar events one will find that the event of the greater of these two 
probabilities will occur more often than the other (a result proved in the 
Essar). Thus, by hypothesis, one will have reason to believe it will happen 
more often than the other, and consequently reason to believe that it will 
happen rather than fail to occur. 

In view of the attention we shall give later to Condorcet’s treatment of 
the rule of succession, it seems wise at this stage to give his definition of 
a future event, viz. “un évéenement futur n’est pour nous qu’un évenement 
inconnu” [p. x]. A clear distinction is also drawn between certainty and 
probability: 


nous donnons le nom de certitude mathématique a la proba- 
bilité, lorsqu’elle se fonde sur la constance des loix observées 
dans les opérations de notre entendement. Nous appelons certi- 
tude physique la probabilité qui suppose de plus la méme con- 
stance dans un ordre de phénoménes indépendans de nous, & 
nous conservons le nom de probabilité pour les jugemens ex- 
posés de plus a d’autres sources d’incertitude. [p. xiv] 


After discussing various matters concerned with voting, Condorcet turns 
in his Analyse de la troisiéme Partie [pp. lxxxij—cxxviij] to matters that 
directly concern us. The object of this part he describes as follows?": 


nous nous proposons dans cette troisieme Partie de donner les 
moyens, 1°. de déterminer par l’observation la probabilité de la 
vérité ou de la fausseté de la voix d’un homme ou de la décision 
d’un Tribunal; 2°. de déterminer également, pour les différentes 
espéces de questions qu’on peut avoir a résoudre, la probabilité 
que l’on peut regarder comme donnant une assurance suffisante, 
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c’est-a-dire, la plus petite probabilité dont la justice ou la pru- 
dence puisse permettre de se contenter. [p. lxxxi] 


The first of these questions he proposes to answer in two different ways: 


(a) by determining the probability of a future judgment, from the know- 
ledge of the truth or falsity of judgments already delivered, and 


(b) by determining the probability of a future judgment, from those of 
judgments delivered, using only the hypothesis that the probability 
that one opts rather for truth than for error, is at least 1/2. 


He also states that it is to be assumed in such calculations that the law of 
the events is constant. 
He passes on next to the rule of succession, phrasing it as follows: 


que pour avoir la probabilité d’un évenement futur, d’apres la 
loi que suivent les événemens passés, il faut prendre, 1°. la prob- 
abilité de cet évenement dans |’hypothese que la production en 
est assujettie & des loix constantes; 2°. la probabilité du méme 
évenement dans le cas of la production n’est assujettie a aucune 
loi; multiplier chacune de ces probabilités par celle de la suppo- 
sition en vertu de laquelle on l’a déterminée, & diviser la somme 
des produits par celle des probabilités des deux hypotheses. 

[p. lxxxiv] 


This is illustrated by a numerical example: we shall postpone any discussion 
of this point until the pertinent part of the /ssaz proper. 

Condorcet passes on next to what we recognize as a discrete form of 
Bayes’s Theorem?®, one which we can write as 


Pr [Hi | E} = Pe{E | Hi] / SD Pr le | Ay) . 


This is in turn followed by a verbal statement of what is essentially the 
theorem of total probabilities, i.e. Pr[#] = $>Pr[FH;], which in turn is 
followed by the curious remark that 


ce n’est donc pas la probabilité réelle que |’on peut obtenir par 
ce moyen, mais une probabilité moyenne. [p. lxxxvj] 


In his Analyse de la quatriéme Partie Condorcet discusses the applica- 
tion of the methods of his third part to certain voting situations. He em- 
phasizes that, when one has past data to consider, it is only the pertznent 
information that must be taken into account??: 


lorsqu’il s’agiroit de déterminer la probabilité d’une nouvelle 
décision, on emploiroit, non la totalité des décisions passées, 
mais seulement le systéme de celles ou le rapport de la plu- 
ralité au nombre des Votans est a peu-prées le méme que dans 
la nouvelle décision. [p. cxxx] 
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The two methods discussed in the third Part, while both being usable 
in the questions of the fourth Part, may be appropriate in different cases: 
indeed, 


si au lieu considérer la distribution des voix dans les décisions, 
on considéroit les décisions en elles-mémes, alors il faudroit 
préférer la premiere méthode, la seconde ne pouvant s’appliquer 
a cette derniere question qu’avec difficulté, & ne pouvant méme 
conduire alors qu’a des résultats hypothétiques. [p. cxxxviij] 


Condorcet next provides an example to distinguish between the real 
probability of the truth of a proposition and the probability that this same 
proposition has a certain degree of absolute or mean probability. The ex- 
ample concerns withdrawals from an urn (or urns) containing white and 
black balls, under the following conditions: 


(i) there are two urns, the numbers of white and black balls present 
being known to the drawer, who also knows from which urn the ball 
is taken; 


(ii) one or more witnesses testify as to which urn the ball comes from 
(such testimony having a certain probability of being true); 


(iii) the witnesses have concluded on the basis of past drawings, which of 
the urns contains more white balls; 


(iv) the drawer is completely ignorant of the composition of the urns (in 
this case only a mean probability is available). 


So much for the Discours Préliminaire: we pass on now to the Essai 
proper?” 

The Essai opens with a two-page introduction summarizing the contents 
of its five parts. Earlier parts of the essay not being pertinent, let us turn 
our attention immediately to the paragraph in the introduction that is 
connected with the third part: 


dans le troisieme, on cherchera une méthode pour s’assurer a 
posteriori du degré de probabilité d’un suffrage ou de la décision 
d’une assemblée, & pour déterminer les dégres de probabilité 
que doivent avoir les différentes espéces de décisions. [p. 2] 


The problems to be discussed in this third part, Condorcet states, require 
firstly 


qu’on ait établi en général les principes d’apres lesquels on peut 
déterminer la probabilité d’un évenement futur ou inconnu, non 
par la connoissance du nombre des combinaisons possibles qui 
donnent cet événement, ou |’évenement opposé, mais seulement 
par la connoissance de |’ordre des évenemens connus ou passés 
de la méme espéce. [p. 176] 
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To this end Condorcet discusses thirteen problems, in which both the rule 
of succession and Bayes’s Theorem are illustrated*!: we shall consider these 
problems seriatim. 


Problem 1 
Soient deux évéenemens seuls possibles A & N, dont on ignore 
la probabilité, & qu’on sache seulement que A est arrivé m fois, 
& N,n fois. On suppose |’un des deux événemens arrivés, & on 
demande la probabilité que c’est l’evenement A, ou que c’est 
Vevenement N, dans l’hypotheése que la probabilité de chacun 
des deux événemens est constamment la méme. [p. 176] 


Let H, denote this hypothesis, and let x denote the probability of A. 
The probability of m A’s and n N’s (event EF, say)*?, is Gass “(1 — ae 
Hence the probability of EB “pour toutes valeurs de x depuis zéro jusqu’a 
1” [p. 177] will be given by 


Pr[E | Hy] = [ (™) om — 2) de 


Proceeding similarly we can show that? 


Pr{A| FA,| = [> a 1-ay'de | fo (1 — 2x)" dx 


(m+ 1)/(m+n+2), 
a similar result holding for Pr[N | FH]. 


I| 


Problem 2 
On suppose dans ce Probleme, que la probabilité de A & de 
N n’est pas la méme dans tous les évenemens [hypothesis Ho, 
say], mais qu’elle peut avoir pour chacun une valeur quelconque 
depuis zéro jusqu’a l’unité. [p. 177] 


In this case, asserts Condorcet (and in the same notation as before)**, 


Pr(E| Ha] = (™*") [fea] i -2)de], 


= ("from 


("i") fea)” [fa -zyas] 


Ginga g—(m+n+1) 


Thus 


Pr[AE | Ho] 


tI 


| 
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and hence 

Pr[A | EA} = 1/2 
(and similarly for Pr[N | E.H.2]). Noting that this is the same as the result 
we would obtain on taking Pr[A] = 1/2 = Pr[N], we see that Condorcet 
seems to have confused the sentiment?’ “suppose that the probabilities are 
not constant” with “do not suppose that the probabilities are constant”. 


Problem 3 
On suppose dans ce probleme que I’on ignore si a chaque fois 
la probabilité d’avoir A ou N reste la méme, ou si elle varie 
a chaque fois, de maniére qu’elle puisse avoir une valeur quel- 
conque depuis zéro jusqu’a l’unité, & l’on demande, sachant 
que l’on a eu m évéenemens A, & n évenemens JN, quelle est la 
probabilité d’amener A ou N. [p. 178] 


Two cases are considered here: 
(i) if the probability is constant (hypothesis H;), 


m+n 
n 


Pr(E | Ha] = ( ) m!n! /(m-+n-+1)! 
(ii) if the probability is not constant (hypothesis H2), 


Pr[E | Ha] = (™T™)a-mtm) 


n 


Thus, under the implicit assumption of equal initial probabilities for H,; 
and Hy», and using a discrete form of Bayes’s Theorem, we see that 


min! min! 1 
se fot ee 


In! 1 
—(n+m) min: 
é Nate 


Pr [H» | E] 


Recalling that 
Pr{A | EH] =(m+4+1)/(m4+n4+2) , Pr{A| LHe] = 1/2 
Pr[N | Hi) = (n+1)/(m4+n4+2) , Pr[N | BH] = 1/2, 
we see finally that 
Pr[A | E] = Pr[A | £A,] Pr[H, | £] + Pr[A | BH] Pr [He | £] 


m+1 m! n! m! n! 4 1 
~ m+n+2 (m+n+1)! (m+n-+1)! gn+m 


+ (1jaja-etm /| mint = son | 
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{m+1)!nt as oon ieee _ mini ae i 
(m+n+2)! gn+m+1 (m+n+1)! Qn+m | > 
a similar expression holding for Pr[N | £]. 
As aremark Condorcet considers the ratio of the terms m! n!/(m+n-+1)! 


and 2~("+™) when m = an and n — oo. If a = 1, it follows from the 
Stirling-de Moivre approximation that, as n — oo, 


m! n! 
Pee 25a —(n+m) _, 
(m+n+ a /2 vi 


Furthermore, if a # 1, the ratio tends to infinity as n — oo. Condorcet 
then goes on to expand verbally on this result (for criticism see Todhunter 


[1865, art. 700]). 


Problem 4 

On suppose ici un événement A arrivé m fois, & un évenement 
N arrivé n fois; que l’on sache que la probabilité inconnue d’un 
des évenemens soit depuis 1 jusqu’a 7 & celle de l’autre depuis 
5 jusqu’a zéro, & l’on demande, dans les trois hypothéses des 
trois problémes précédens, 1°. la probabilité que c’est A ou N 
dont la probabilité est depuis 1 jusqu’a 7 2°. la probabilité 
d’avoir A ou N dans le cas d’un nouvel événement; 3°. la prob- 
abilité d’avoir un évenement dont la probabilité soit depuis 1 
jusqu’a 5. [p. 180] 


Condorcet supposes firstly that the (unknown) probability is constant 
(hypothesis H,). Denoting by p4 and py the probabilities of A and N we 


have®® 
1/2 
ee / ele ae 
i 0 


PriE&0<pa <1/2| Ai] 


n 


| 1 
Pr[E&1/2<pa<1{Ai] = eo) : z™(1—2)" dz , 
1/2 


where FE denotes the event that A and N have occurred m and n times 
respectively. Again by a tacit application of Bayes’s Theorem Condorcet 
deduces that 


Pr[l/2< pa <1| FH] 


Pr [1/2 < PA < 1& EH] 
Pr[0 < pa < 1/2 & EA,) + Pr {1/2 < DA < i & FH] 


1 1 
= | z™(1-— 2)” ix/ | z™(1—2)"dz , 
1/2 ¢) 
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and similarly 


1/2 1 
Pr[0 < pa <1/2| FH,] = / mapas / | z™(1— 2)" dr 
0 0 


(in each case the left-hand side is given by Condorcet as an unconditional 
probability). This completes the solution of the first question. 
Proceeding to the second question we see that 


1/2 1/2 
Pr[A|0<pa <1/2& EM) = pemta-sy Ae pema-ayp de 
0 0 
1 1 
Pr[A|1/2<pa<1& FA\]= [enranarae/ [oma-zrar 
1/2 1/2 


Thus 
Pr[A|£H,] = Pr[A|0<pa <1/2 & EH] Pr[0 < pa < 1/2 | EA]] 


+Pr{[A|1/2< pa <1& FH] Pr{1/2< pa <1| FA] 


1 1 
= / a™ti(1 — a2)" iz/ | c™(1—2)" dz , 
J0 0 


__(m+1) 
~~ (m+n4+2)’ 
and similarly 
1 1 
Pr[N|FH,] = } 2™(1—2)"t} ix/ | z™(1— 2)" dx 
0 0 
_ (n+ 1) 

(m+n-+ 2) - 


Condorcet’s solution to the third question runs as follows: 
Pr(A & 1/2 <pa <1) V(N & 1/2< py <1)| BM] 
= Pr[A&1/2<p4<1| 2A] + Pr[N & 1/2<pn <1| EA, 
= Pr{[A|1/2<p,<1& EA,|Pr{1/2 <p, <1| FA] 


+ Pr[N |1/2< py <1 & EA, Pr[1/2< pn <1| £A4) 
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1 1 
fc™ *(1—ac)*dr| | f «™(1-—2)"dz 
1/2 1/2 


1 1 
fo ™t1—2)"dx} | f 21-2)" dz 
1/2 1/2 
1 i 
f 2*(1—2)"dz fer1—2)™dz 
1/2 0 


- a [e™tt(1 — 2)" 4+ 2"t1(1 — 2] tx/ [ 2™(1—2)" dz 


= [Byyjo(n+1,m4 2) + Byjo(mt 1,n+2)] /B(mti,n+1). 
t37 


+ 


Condorcet next considers the same questions assuming tha 


la probabilité changeante a chaque évenement, mais étant tou- 
jours pour le méme, ou depuis 1 jusqu’a 5 ou depuis 0 jusqu’a 
5. [p. 182] 


The solution presented by Condorcet is most confusing: the following is an 
attempt at interpretation. 
We have firstly 


Pr{[E &0<pa < 1/2 | Ho] 


fqa- =a 


€ xz dz 
1/2 


1/2 


| 
“—— 
= 
Ee 
peas 


Pr[E & 1/2< pa <1| A] 


where H2 denotes the hypothesis of changing probability. The numerators 
in these two ratios are given (correctly) by Condorcet as 


Gar. / gmtn and (™F") gm j gmtn 


respectively. Hence 


Pr[l/2<pa <1| EH2] 


Pr[E & 1/2 < pa <1| H2}/Pr[E | Ap] 


(I 
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a similar expression holding for Pr [0 < pa < 1/2 | EH]. 
Condorcet now goes on to give 


la probabilité d’avoir une fois de plus |’événement A, si la prob- 
abilité de A est depuis 1 jusqu’d §. [p. 182] 


This probability is found as follows: 


Pr{[AE & 1/2<p,a<1/H 
Sl a a EE Sst ap 


i ia ay ae n 
Cn") (i : | « — | [v0 ae 
oe (i x ts (fo — x) ts / om n) 


= 3/4, 


1 ty j 
where D(i,j) = ( zr ir) (fa — 2) ir) , and similarly 
0 0 


Pr[A|0<pa < 1/2 & EH] = 1/4. 


It then follows that 
Pr[A| F&A.) = 


Pr[A|0<pa < 1/2 & EH2)Pr[0 < pa < 1/2 | EAD] 
+Pr{A|1/2< pa <1& EH] Pr[1/2 < pa <1| EH] 
= [38"(3) + 3"(4)]/ (38 + 3”), 


and similarly 


Pr[N | EH2] = [3™(4) + 3°(9)]/ (3" +3"), 
these being Condorcet’s solutions. 
To answer the third question notice that 


Pr[A & 1/2< pa <1| BHD] 


= 3"(2)/(0" +3") 


Pr(N & 1/2< pn <1| EY) 37(2)/(3™ + 3”). 


Thus 


Pr{(A & 1/2< pa <1)V(N & 1/2< py < 1)| EM] = 3 
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As the final case Condorcet considers the answering of these three ques- 
tions under the assumptions of Problem 3. Under the two hypotheses 


H, and Ho, the respective probabilities of F are as ic z™(1 — 2)" dz to 
(3 + 3") /4™*” | since 


Pre |). Ay| = a [ a(n)" de 


n 


and 
m+n 


nl 


Pr[E | H2] = ( ) (3™ 4.3") fame | 


It then follows, under the assumption that Pr[H,] = Pr[Ho], that 
Pr[l/2<pa <1|£] 


_ PrlE & 1/2<pa <1| Wi) +Pr[E & 1/2< pa <1 | AD] 
7 Pr{E | H,)+ Pr {FE | Ho] 


1 
i x™(1— 2)" dx + 3™/4m+n 
1/2 


— 


SS 
/ n™(1— 2)" dx + (3 + 3n)/4m+n 

0 
Similarly the probability of obtaining one more A is 


Pr[A & 1/2 < pa <1| Ho] 


1 
/ gerrt _ 2)" dr + Ga = SY ani 
1/2 


) 


i 
/ c™(1— 2)" dz + (3 + 37)/4mtn 
0 


while the probability of getting an event (either A or N) with probability 
between 5 and 1 is 


Pr{((A& 1/2<pa<1)V(N &1/2< pn <1) | £] 


1 
a Fag = x)” ae eet a ce ed dz + (Bees a Sa Fe 


i 
/ z™(1— a2)? dz + (8™ + 3")/4mtn 
0 


Problem 5 


Conservant les mémes hypotheses, on demande quelle est, dans 
le cas du probleme premier, la probabilité, 1°. que celle de 
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V’évenement A n’est pas au-dessous d’une quantité donnée; 2°. 
qu’elle ne difféere de la valeur moyenne m/(m +n) que d’une 
quantité a; 3°. que la probabilité d’amener A, n’est point au- 
dessous d’une limite a; 4°. qu’elle ne différe de la probabilité 
moyenne (m+ 1)/(m+n-+ 2) que d’une quantité moindre que 
a. On demande aussi, ces probabilités étant données, quelle est 
la limite a pour laquelle elles ont lieu. [pp. 183-184] 


The solution presented to 1° runs as follows: since 


1 
_ (m+n a e 
Pr [E] = ( ‘ )| a a)? de 
and : 
Pr[EH] = oS i z™(1— 2)" dx 
(where H is the proposition a < pg < 1), it follows that 


M Pr{H | EB) = Pr[EH]/Pr(£] 


[ z™(1— 2)" ie | f 2™(1—2)" dz 


m+k+1 tie n—k min! 
(1 — a) (m+n+1)! 


UII 


1 ge ‘ 


where (n), = n(n—1)...(n—k+1). This result is more elegantly given 
in terms of the incomplete beta-function as 


M=1-B,(m+i1,n+1)/B(m+i1,n4+1), 


or 
M=1-I1,(m+1,n+1). 


Proceeding to the second question, Condorcet states that 


i x™(1—2)? ie/ fe a-2y dz 


1 1 
Pri(6<p4a< ls) = [ era-ayae/ | c™(1—«)" dz 


where a= m/(m+n)+a, 6 =m/(m+n)—a. Subtraction of the first of 
these formulae from the second then gives Pr[@ < pa < a | E]. Condorcet 
evaluates this probability, obtaining an expression analogous to M in the 
preceding question — in fact 


Prla<pa<1|£ 


Ig(m+1,n4+1)—Lo(m+1,n+1). 
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The solution to the third question is given, if a is always the limit of the 
probability of A, by M in 1°. 


On aura donc une probabilité égale que celle d’amener |’événe- 
ment A n’est pas au-dessous de a. [p. 185] 


A similar expression to that in question 2° is given for 
Pr[(m+1)/(m+n+2)—a< pa <(m+1)/(m+n+2) +a]. 


As a final remark Condorcet points out that the formulae given here serve 
equally to determine M in terms of a or a in terms of M, but that this 
latter value will be impossible to obtain rigorously. A general expression 
for M is given. 


Problem 6 
kin conservant les mémes données, on propose les mémes ques- 


tions pour le cas ot la probabilité n’est pas constante. [p. 186] 


As was the case in Problem 4, the treatment presented here by Con- 
dorcet is difficult to follow. The solution offered below is consistent with 
those of earlier problems, and results in the answer obtained by Condorcet. 

In answer to the first question we note that 


(jae) (fa-z)de) 
(fede) (fa-a)az) | 


7 Pr[B & a<pa<l|Ho] 
~ Prilk & a<pa<l|Ho]+Pr[E & 0<pa<alHo] 


1—a? ic 1—2a+a? " 
Pe 8 ees (7) 


= (l—a?)""(1—2a+a?)"+a2™(2a—a?)" ° 
Proceeding to the second question, Condorcet finds, exactly as above, 
Pr{[E & (b-—a) < pa < (b+) | He] and the corresponding result analo- 
gous to (7). 
As regards the third question, we have, from (6) with m = 1 and n= 0, 


(6) 


n 


Pr(E &a<pa <1| He] = ("2") 


Thus 
Pr{a < pa <1| EAp] 


Pr(A &a< pa <1| Ho] = ((1/2) — a*/2)/ (1/2) = 1-4’, 
while, in answer to the fourth question, 
Pr[A & (ba) < pa < (b+) | Ho] = (b +0)? —(6—a)?. 
As a remark following this problem Condorcet points out that the case 


resulting from a combination of the previous two can readily be solved by 
using Problems 3, 5 and 6. 
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Problem 7 
Supposant qu’un événement A est arrivé m fois, & qu’un événe- 
ment N est arrivé n fois, on demande la probabilité que |’événe- 
ment A dans q fois arrivera q —q’ fois, & V’évenement N, q’ fois. 
[pp. 187-188] 


Denoting by z and 1 — z the probabilities of A and N respectively, 
Condorcet shows in the usual way that?® 


(m4) (8) [amie ayn 
0 
2") [oma —2z)"dz 
0 


(1) 0 tot Amt) (mto=#) 
q' (m+n+2)...(m+n+q+1) 


Pr{(gq—q')A’s&q' N’s| E]= 


_ q\ Bom +q—-qt+i,n+q4+1) 
AG! B(m+1,n+1) ) 


Condorcet follows this with a remark in which he gives the probabilities 
of the events 


gq A’s; (q-1)A’8 &1 N;...;1A & (q—1)N’s; g N’s, 


and he notes that the sum of these probabilities, irrespective of the values 
m,n and qg, must of necessity be 1. 


Problem 8 
On demande dans la méme hypothése, 1°. le nombre des événe- 
mens futurs étant 2q+1, la probabilité que le nombre des évene- 
mens N ne surpassera pas de 2q' + 1 le nombre des événemens 
A; 2°. la probabilité que le nombre des événemens A surpassera 
de 2q' + 1 le nombre des événemens N. [p. 189] 


The solutions are easily found on applying the result of the preceding 
problem: most of Condorcet’s five and a half page solution is concerned 
with manipulations of the initial expressions. 

Three remarks follow: in the first of these, Condorcet points out that 
the analogy between the formulae developed in this problem and those of 
the first part of the Essai shows that the latter may be used when m and 
n are large. In the second remark he finds the probability that the event 
A rather than N has happened, if one knows merely that one event has 
happened 2q' +1 times more than the other. Again this result is related to 
the corresponding one in Part 1. In the final remark various ratios of m to 
mn are considered. 
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Problem 9 
Nous supposerons ici seulement que le nombre des Votans est 
2q, & la pluralité 2q’, & qu’on demande V & V’ comme dans 
le Probléme précédent. [p. 197] 


(Here V and V’ are the probabilities desired in 1° and 2° respectively in 
the previous problem.) The solution is followed by a remark analogous to 
the second remark following the preceding problem: neither the present 
solution nor the remark contributes anything new to our discussion. 


Problem 10 
On demande, tout le reste étant le méme, la probabilité que 
sur 3q évenemens, 1°. N n’arrivera pas plus souvent que A un 
nombre q de fois, 2°. que A arrivera plus souvent que N un 
nombre gq de fois. [p. 199] 


The method of solution parallels that of Problem 8, and will not be 
discussed here. Two remarks follow. 


Problem 11 

La probabilité étant supposée n’étre pas constante comme dans 
le Probléme second, on demande 1°. la probabilité d’avoir sur q 
événemens, g—q’ évéenemens A, & q’ évéenemens N; 2°. la prob- 
abilité que sur 2g + 1 évenemens, N n/’arrivera pas un nombre 
2q' + 1 de fois plus souvent que A; 3°. la probabilité que A ar- 
rivera un nombre 2q' + 1 de fois plus souvent que N. 

[pp. 204-205] 


Proceeding in the usual way we find that 


Pri[(q—q’/)A’s & qd’ N’s| £] 


() (f'sas)""" ( f'a-syae) 
(LS Gornsy 


| 
a 
a ar 
NL” 
Si 
ao 


This is the solution to the first question: the remaining two are special 
cases of certain results given in the first part of the Essaz. In a remark 
Condorcet points out that when one is ignorant as to which of the two 
hypotheses holds, one should proceed as in Problem 3. 
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Problem 12 
On suppose que la probabilité d’un des événemens est depuis 
1 jusqu’a 5 & celle de autre depuis f jusqu’a zéro, & on de- 
mande dans cette hypothése; 1°. La probabilité que A arrivera 
q—q' fois dans q événemens, & N, q’ fois; ou que l’évéenement 
dont la probabilité est depuis 1 jusqu’a 7 arrivera gq — q’ fois, 
& celui dont la probabilité est depuis 5 jusqu’a zéro, q’ fois. 
2°. La probabilité que sur 2g +1 événemens, N n’arrivera point 
2q' + 1 fois plus souvent que A; ou que l’événement dont la 
probabilité est depuis s jusqu’a zéro, n’arrivera pas 2q’ + 1 fois 
plus souvent que l’évenement dont la probabilité est depuis 1 
jusqu’a $. 
3°. La probabilité que sur 2g + 1 événemens, |’événement A 
arrivera 2q’ + 1 fois plus que N; ou que |’événement dont la 
probabilité est depuis 1 jusqu’a 7 arrivera 2q’ +1 fois plus sou- 
vent que celui dont la probabilité est depuis 5 jusqu’a zéro. 
[pp. 205-206] 


The solution to the first question is as follows (cf. Problem 4): 


Pri(q—q') A’s & q! N’s| (1/2) < pa $1 & EH] 


1 1 
= |) | eTtind (1 ~ 2) tt dz 7, z™(1—2)" dz. 
Gq? Jiye 1/2 


Similarly?? 


Pr{(q—q')A’s& 7 N’s|0< pa < (1/2) & EA] 


1/2 Lye 
= (4) il grind (1 —2)"Tt dz / z™(1—2)" dz. 
0 0 
Now 


Pril/2<pa <i Bm)= [ am(i—aytde | f 212)" de 


1/2 


and 
1/2 1 

PO Sak /2|EH|= | 2™(1—2)" de / e™(1— 2)" de. 
0 0 

Thus, as in Problem 4, 


Pr[(q—q)A’sk&q’ N’s| FH, 


1 1 
= (2) i. gmti-9 (4 —g)rrd dz / f c™(1—2)" dz. 
q 0 0 
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By a procedure similar to that adopted in the solution to Problem 4, one 
finds that 


Pr(((q—4') A’s & 1/2 < pa S$ 1) V((g—-@/) N’s & 1/2 < pw <1) | EM] 
1 
= (9) [ert aay pantera — ayn ts) B(m+1,n+1), 
1/2 


where B(-,-) denotes the beta-function. 
The solutions to the first parts of articles 2 and 3 follow as in Problem 8. 
The answer to the second part of the second article is given as 


(1/D) if euch ea) aa ree oe) | da 


1 
+ (2q¢+ yf [a™t29(1 — 2)Ptl 4 ot t297] — 2)™t!] dr +... 
1/2 


2q+1 Pi ; 
Si eae J gtesmn.aa)da 
where 


g(a;m,n,q,q’) = arti tly — gyrtate 4 prta—-P tly _ gymtate 
and 
p= | a eae)" aa: 
0 


The solution to the second part of article 3 follows on using formulae from 
the first part of the Essaz. 

Condorcet points out in a remark that solutions to similar problems may 
now be obtained “sans peine”. 


Problem 13 - 
n suppose que la probabilité n’est pas constante, &, les autres 


hypotheses restant les mémes que dans le Probleme précédent, 
on propose les mémes questions. [p. 211] 


Proceeding as in the solution to Problem 4, we note firstly that 


Pr{(q—q')A’s& q’ N’s|1/2<p4a<1& EQ] 


1 m+q-q 1 n+q/ 
(i ede) (i (1-2) ts 
1/2 1/2 (8) 
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Similarly 
Pri[(g—q')A’s& q! N’s|0< pa < 1/2 & EH] 


1/2 mt+q—q' 1/2 n+q' 
(iF ede) (i (1-2) is 


fg 0 0 
_ ( ) 1 1 /1/2 MET 79 aes (9) 
(J rar) { xdzx f (—2) dz 
0 0 0 
Thus on multiplying (8) and (9) respectively by the probabilities 
Pr[1/2 < pa $1| BH] and Pr(0 < pa < 1/2| EHD] 


(these being found as in Problem 3), and on setting 


1 m+q—q' i n+q’ 
i (/ sds) ( ads] 
1/2 1/2 


: m+q—q' ; n+q’ 
 * [/ 0-2 ( [ae] 
n= (['s) (f\r#)" (fia-ne) 
n= (fs) (fre) (f0-oe)” 


we eventually find that 
Pr((q—q')A’s& q' N’s| EH] = 4) Gedo Crea 
q m+q—q’' n+q' m n 
=) (amte-s' 4 ante’) / 49 (3™ 4 3") | 


In a similar fashion one can show that 


Pr{((q~q') A’s & 1/2 < pa <1) V((q—4') N’s & 1/2 < py < 1) | EHY] 


= © gana! fae 


The solutions to parts 2 and 3 are found in a manner analogous to 
that used in the corresponding parts of the previous problem. 
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Condorcet now suspends his examination of such matters and goes on to 
apply the preceding principles. The first question considered is concerned 
with the finding*° 


des moyens de déterminer, d’aprés |’observation, la valeur de la 
probabilité de la voix d’un des Votans d’un Tribunal & celle de 
la décision d’un Tribunal donné. [p. 213] | 


Two methods of solution are presented: the first does not concern us, and 
we shall comment but briefly on the second. In the latter, three hypotheses 
are considered: 


(i) in each decision the vote of each voter has a constant probability; 
(ii) the probability varies in each decision and for each voter; 


(iii) both (1) and (11) may be admitted together, by multiplying the prob- 
ability that results from each by the probability that this hypothesis 
arises. 


Condorcet advises against considering (i) on its own, finding the desired 
probability to be purely mathematical. The second hypothesis leads to the 
results of Problems 4 and 13, and so only (iii) need be considered, and 
under this hypothesis the results of Problems 4, 12 and 13 are applicable. 

The remainder of this part of the Essai is devoted to the determination 
of the probabilities of decisions under certain conditions, and does not 
contribute anything to our study. 

In the introduction to the F'ssaz Condorcet describes the scope of the 
fourth part as follows: 


on donnera le moyen de faire entrer dans le calcul l’influence 
d’un des Votans sur les autres, la mauvais foi qu’on peut leur 
supposer, ]’inégalité de lumiéres entre les Votans & les autres 
circonstances auxquelles il est nécessaire d’avoir égard pour ren- 
dre la théorie applicable & utile. [p. 2] 


Much use is made of the results of the third part: the integrals in the present 
part are not derived in as much detail as in the previous part, but no new 
results are to be found here*!. 

In the fifth part various applications of the preceding theory are given: 
once again nothing pertinent is to be found. 

The Essai concludes with the following words: 


la difficulté d’avoir des données assez sires pour y appliquer le 
calcul, nous a forcés de nous borner a des apercus généraux & 
a des résultats hypothétiques: mais il nous suffit d’avoir pu, en 
établissant quelques principes, & en montrant la maniere de les 
appliquer, indiquer la route qu’il faut suivre, soit pour traiter 
ces questions, soit pour faire un usage utile de la théorie. 


[p. 304] 
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What we have discussed here provides ample evidence of Condorcet’s 
ability — not only in handling abstruse probabilistic concepts, but also in 
rendering obscurum per obscurtus. It is thus a bit severe of Cajori [1919a] 
to dismiss the work with the words 


[Condorcet’s] general conclusions are not of great importance; 
they are that voters must be enlightened men in order to ensure 
our confidence in their decisions. [p. 244] 


6.6 Diuscours sur l’astronomie et le calcul 
des probabilités 


This article*”, containing little to our purpose, was read at the Lycée in 
1787. In the second half of the paper we once again find a reference to 
Pascal, de Méré and Fermat as the originators of the probability calculus, 
and this is followed by a passage in which Pearson [1978, p. 503] finds 
Bayes’s Theorem used. The pertinent extract runs as follows: 


Nous prouverons que le motif de croire a ces vérités réelles, 
auxquelles conduit le calcul des probabilités, ne differe de celui 
qui nos détermine dans tous nos jugements, dans toutes nos 
actions, que parce que le calcul nous a donné la mesure de ce 
motif, et que nous cédons, par |’assentiment éclairé de la raison, 
a une force dont nous avons calculé le pouvoir, au lieu de céder 
machinalement a une force inconnue. [p. 499] 


I think an abundance of charity is needed to see any application of Bayes’s 
work here, and there is nothing else even remotely relevant in the paper. 


6.7 Eléméns du calcul des probabilités 


This work, the full title of which is Eléméns du calcul des probabilités et son 
application aux jeux de hasard, a la loterte, et aux jugemens des hommes. 
Avec un discours sur les avantages des mathématiques sociales, was pub- 
lished posthumously in An XIII — 1805, together with an anonymous “no- 
tice sur M. de Condorcet”. It is not discussed by Todhunter. 

Intended as the fourth volume of Condorcet’s annotated edition of Eu- 
ler’s Lettres a une princesse d’Allemagne sur quelques sujets de physique et 
de philosophe (an edition with which Lacroix was associated), this treatise 
contains the following general comment in the introductory note: 


On a justement reproché a tous les ouvrages mathématiques — 
de Condorcet, d’ailleurs remplis de découvertes profondes dans 
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Vanalyse, d’étre pénibles a lire et difficiles a entendre. Souvent 
méme les méthodes qu’il emploie sont tellement généralisées, 
qu’elles échappent aux cas particuliers. Qu’il est loin de la clarté 
transparente de l’analyse d’Euler, ou de la simplicité élégante 
de celle de la Grange! [pp. vi-vij] 


This book consists of seven articles*’, followed by a Tableau Général de 
la Science**. The first two articles* contain nothing relevant to the present 
study: we thus turn our attention immediately to the third, “Des principes 
fondamentaux du calcul des probabilités” [pp. 56-79]. 

Speaking of equally possible events, Condorcet writes 


On cherche d’abord a déterminer le nombre de tous les événe- 
mens également possibles, et il est absolument nécessaire de 
remonter a ceux auxquels il est permis de supposer cette égale 
possibilité, sans quoi le calcul deviendroit absolument hypothé- 
tique. On cherche ensuite, dans ce nombre d’évenemens égale- 
ment possibles, quel est le nombre de ceux qui remplissent 
une certaine condition, et on dit que la probabilité d’avoir un 
évenement qui remplisse cette condition, est exprimé par le sec- 
ond de ces nombres divisé par le premier. [p. 56] 


He then goes on to point out that 


Il n’est donc pas nécessaire, pour avoir la probabilité, de con- 
naitre le nombre total des évenemens, mais seulement le rapport 
du nombre de ceux qui |’on veut considére avec ce nombre total. 


[p. 57] 
The addition formula for mutually exclusive events is phrased as 


la probabilité d’avoir l’un ou l’autre des évenemens qui remplis- 
sent des conditions différentes, est égale a la somme des proba- 
bilités qu’on a pour les évenemens qui remplissent chacune de 
ces conditions. [p. 59] 


Condorcet next considers the question of sampling with replacement 
from an urn containing four balls (say) (white or black). If four draws 
result in three white balls and one black (event /, say), one might be in- 
terested in the probabilities of the various possible compositions of the urn. 
After some calculations, he passes on to consider the probability of getting 
a white ball on the next draw, all possible initial compositions of the urn 
being regarded as equally possible. This assumption 


est ici légitime, puisque, d’aprés la nature de la question, je suis 
dans une ignorance absolue sur ce rapport; et la seule donnée 
que j’ale pour evaluer la probabilité qu’il soit plutot exprimé 
par un nombre que par un autre, dépend de l’observation des 
tirages successifs. [p. 68] 
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Denoting by z the probability of drawing a white ball, one finds that 


1 
Pr [white ball & HE] = i 4x°(1—2)dz. 


0 


Having shown that 
1 
/ z™(1— 2)" dz =m! n!/(m+n+4+1)!, 
0 


Condorcet next shows that the probability ae z has a Epecuice value (say 
5) is nought, while the probability that z is A rather than 5 is given as 
25 /3* : 1/24 (this being the ratio of 23(1 — x) at 2 to the on thing at 


x = +). He next evaluates 


Pri(@>1/2)& FE) = [420 —2)dz 


and ie 
Pr((< 1/2) & E\= | 4x7°(1—2) dz 


The factor “4” is missing from both these expressions, which is not too 
serlous an omission since one is really concerned with finding “s’il est plus 
probable que # est au dessus de $ qu’au dessous” [p. 75]. More serious 
is the fact that Condorcet evaluates these integrals (without the “4”) as 
(1 — 1/2) /4.5 and (1/2°) /4.5 respectively. 

Condorcet next shows that the probability of drawing a white ball after 
n white and m black balls have been drawn is (n+ 1)/(m+n-+1), and, 
more generally, that the probability of drawing a further p white and q 
black balls in (p + q) draws is 


Gag oo Cre) 
P q p+4q 

In the fourth article, “De la mesure des vérités auxquelles peut conduire 
le calcul des probabilités” [pp. 79-100], we find a discussion of what Con- 


dorcet accepts as grounds for considering events to be equally possible, 
VIZ. 


Pégale possibilité des évenemens n’a été pour nous que I|’igno- 
rance absolue des causes qui peuvent déterminer un évement 
plutot qu’un autre. Enfin cette définition a suppose encore l’igno- 
rance de l’évenement que |’on consideére, soit que cette igno- 
rance naisse de l’impossibilité ol nous sommes de connaitre les 
évenemens futurs, soit que l’évenement étant actuel ou passé 
nous soit inconnu par d’autres causes. [p. 80] 
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Condorcet also ties up probability with belief by noting that the greater 
the probability of an event, the greater our reason for believing (“motif de 
croire”) in its occurrence should be*®. 

In the fifth article, “Sur la maniere de comparer entre eux des évenemens 
de probabilités différentes, et de trouver une valeur moyenne qui puisse 
représenter les valeurs différentes entre elles d’évenemens inégalement prob- 
ables” [pp. 100-120], Condorcet attributes the invention of the probability 
calculus to Pascal and Fermat*’, and then, in a moment of perhaps justi- 
fiable pride, says 


cette remarque n’est pas inutile; elle peut servir a réfuter ceux 
qui se plaisent a répéter que la nature a refusé le don de |’inven- 
tion, et n’accorde que celui de perfectionner aux hommes qui 
naissent entre Perpignan et Dunkerque. [p. 100] 


Nothing else from this monograph seems pertinent*®. 


6.8 Appendix 6.1 


I can find no trace of a work entitled “Sur les événements futurs” [1803] 
attributed by Keynes [1921] to Condorcet. Keynes may have taken the 
reference from the bibliography in Laurent [1873]. 
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The evaluation of the (n — 1)-fold integral 


fof ed ewe ena day dena 


may be effected by first writing Z as [1 —a,z, —a2z2—--:—QnzZn]~1, where 
a; = cx; /(1 —c+cx;). On expanding this multinomial we obtain, as in our 
earlier work, 


1 nr 

i ay ty tn—1 a 

) ( ie zy... 2 (1 — 21 — + — en)" , 
g=1 


: Dac acc Des 
(i) 


where each 7; is a non-negative integer and where the multinomial coeffi- 
cient is given more generally for positive integral a by 


( an ) = (—1)itietetn pen fag ee at 7 
a 9 : es 
@1,22,--+ en Ubon: bossbe 


(cf. Feller [1968, p. 66]). As before, the integral then becomes a Dirichlet 


integral. 


Laplace 


Looke within; within ts the fountaine of all 
good. Such a fountaine, where springing 
waters can never fail, so that thou digge 
still deeper and deeper. 


Marcus Aurelius Antoninus. 


7.1 Introduction 


Pierre Simon, Marquis de Laplace! (1749-1827) was a prolific writer on a 
wide range of scientific and mathematical topics. The analytic table in the 
(Huvres completes de Laplace covers 56 pages, and Stigler [1978, p. 235] 
has indicated that there are in fact some writings by Laplace not included 
in this collection. I have not, of course, read all of Laplace’s works (a feat 
beside which even the labours of Hercules would seem like child’s play) but 
it is hoped that the present coverage is fairly complete. 

Some dozen memoirs? have been identified as being pertinent to the 
present work, ranging from two early papers published in 1774 to the third 
edition of 1820 of the magnum opus Théorie analytique des probabilités. Of 
course, much of the early material is reprinted in the latter classic, yet it 
is, I think, of interest to examine the memoirs in chronological order, that 
some idea might be gained of the passage of Laplace’s thought on Bayesian 
inference and methods. From each memoir we shall consider, in the main, 
only those parts specific to our topic. 


7.2 Sur les suites récurro-récurrentes 


This paper, fully entitled “Mémoire sur les suites récurro-récurrentes et sur 
leurs usages dans la théorie des hasards”, was published in the Mémoires de 
l’Académie royale des Sciences de Paris (Savants étrangers), Vol. VI [1774], 
pp. 353-371, and contains, strictly speaking, nothing pertinent. The only 
point worth noting (in the context of the present work) is the appearance of 
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an early “definition” of probability? (framed by Laplace as a “Principe” ), 
that 1s, 


La probabilité d’un événement est égale a la somme des pro- 
duits de chaque cas favorable par sa probabilité divisée par 
la somme des produits de chaque cas possible par sa proba- 
bilité, et si chaque cas est également probable, la probabilité de 
V’évéenement est égale au nombre des cas favorables divisé par 
le nombre de tous les cas possibles. [pp. 10-11] 


(Page numbers refer to the 1878-1912 (uvres completes edition of Laplace’s 
works unless otherwise stated.) 

We shall not enter into a discussion of equipossibility (or equiprobability) 
(an assumption to which Laplace was habituated (see Gillispie [1972, p. 7])) 
here: suffice to say that, while Laplace is often viewed as the originator of 
this term, Hacking [1975, p. 122] traces it back to Leibniz in 1678 (op. cit., 
pp. 125, 127). Notice too that this principle is framed initially for cases that 
are not postulated to be equiprobable: this latter idea is only introduced in 
the second clause. (One might perhaps see in the first part of the principle 
the framing of the probability of an event in terms of the probabilities of 
elementary events.) 


7.3 Sur la probabilité des causes 


This “Mémoire sur la probabilité des causes par les événements”, the 
first paper* in which Laplace discussed the probabilities of causes, was 
published’ in 1774 in the sixth volume of the Mémoires de l’Académie 
royale des Sctences de Paris (Savants étrangers). The memoir is in seven 
sections: since many of them contain relevant material, and since “scarcely 
any of the present memoir is reproduced by Laplace in his Théorie ... 
des Prob.” (Todhunter [1865, art. 880]), we choose to give it rather more 
attention than it perhaps merits in the corpus of Laplace’s works. 
The essay opens with the following well-known words: 


La théorie des hasards est une des parties les plus curieuses et 
les plus délicates de |’Analyse, par la finesse des combinaisons 
qu’elle exige et par la difficulté de les soummettre au calcul. 


[p. 27] 


After mentioning certain other of his memoirs, Laplace explains the purpose 
of the present one as follows: 


je me propose de déterminer la probabilité des causes par les 
événements, matiére neuve a bien des égards et qui mérite 
d’autant plus d’étre cultivée que c’est principalement sous ce 
point de vue que la science des hasards peut étre utile dans la 
vie civile. [p. 28] 
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The importance of (parts of) this memoir to our present theme cannot 
be overstressed: indeed ‘Todhunter says: 


This memoir is remarkable in the history of the subject, as being 
the first which distinctly enunciated the principle for estimating 
the probabilities of the causes by which an observed event may 
have been produced. [1865, art. 868] 


However, he goes on to say (loc. cit.) “Bayes must have had a notion of 
the principle ...”, an assertion the reason for which is by no means clear®. 
Bayes does not explicitly refer to the “probability of causes”, and, as we 
shall see later, there is room for doubt as to the exact connexion between 
Bayes’s and Laplace’s results (there is no mention of Bayes in the memoir)’. 

After an introductory Article, Laplace begins the second section of this 
memoir with a careful distinction between those cases in which the event 
(of interest) is uncertain, although the cause on which the probability of 
its occurrence depends is known, and those in which the event is known 
and the cause is unknown [p. 29], that is, a distinction between direct and 
indirect (or inverse) probability. Stating that all problems in “la théorie des 
hasards” may be brought into one or other of these classes, Laplace declares 
his intent to restrict his attention only to those in the second class, to the 
furtherance of which end he asserts® the following fundamental principle’: 


Principe. — Siun événement peut étre produit par un nombre n 
de causes différentes, les probabilités de l’existence de ces causes 
prises de |’événement sont entre elles comme les probabilités de 
l’événement prises de ces causes, et la probabilité de |’existence 
de chacune d’elles est égale a la probabilité de |’événement prise 
de cette cause, divisée par la somme de toutes les probabilités 
de |’événement prises de chacune de ces causes. [p. 29] 


In modern notation, this principle states the following two “facts”: 


‘ Pr{A; |Z] — Pr{£ | Aj] ae . . 
(i) Pr[4;}E] ~ Prlz| Aj] S HgE4 Ee Decry Poe g 


(ii) Pr [As |B] = Pele | A] /SoPrE | Ai) PEAT iis 9s 


It is here perhaps that we have the first occurrence of the so-called'° 
“Bayes’s Theorem” with a uniform prior, a result that can be stated more 
generally as follows: 


Let E be an event (of positive probability) which can occur in 
conjunction with one of the mutually exclusive and exhaustive 
events H,, H2,...,Hn, each of positive probability. Then, for 
each 7 € {1,2,...,n}, 
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Pr[H; | E] = Pr{E| Hi Pr un] [DP [E | Hy) Pr [Hj] - 


Several points are worthy of note in connexion with this principle: firstly, 
it is tacitly assumed that the prior probabilities of the causes are equal, 
and secondly, Laplace refers to “n” causes and uses the word “somme” 
— though the applications he indulges in are in fact not discrete. ‘That 
the present nice distinction between )> and { was not observed during 
Laplace’s time is of course well known: the point is clearly illustrated in 
the Théorie analytique des probabilités, Book II, art. 23, where we find the 
sentence 


la somme des erreurs a4 craindre, abstraction faite du signe, 
multipliées par leur probabilité, est donc pour toutes les valeurs 
de x’, moindres que /, [(J— 2’)y’ dz’. [p. 339] 


One might see then, in this fundamental principle, a continuous analogue 
of the above result, viz., 


el =stvle) | f ul 2)ae, 


After applying this principle to a simple urn problem, Laplace proceeds, 
in his third article, to a problem?! nearer to our investigation, viz. 


Si une urne renferme une infinité de billets blancs et noirs dans 
un rapport inconnu, et que l’on en tire p+q billets dont p soient 
blancs et q solent noirs; on demande la probabilité qu’en tirant 
un nouveau billet de cette urne il sera blanc. [p. 30] 


In his solution of this problem, Laplace explains his choice of a (discrete) 
uniform prior in the following way: 


Le rapport du nombre des billets blancs au nombre total des bil- 
lets contenus dans l’urne peut étre un quelconque des nombres 
fractionnaires compris depuis 0 jusqu’a 1. [p. 30] 


(At least, as Edwards [1978] has observed!?, Bayes gave an argument for 
his assumptions!) Representing this unknown ratio by z, Laplace then says 
(correctly) that the probability of drawing p white (or blank) (lottery-) 
tickets and q black is z?(1 — x)?. Consequently, by the principle of his 
preceding Article (and no additional argument is presented) the probability 
that x is the true ratio of the number of white tickets to the total number 
of tickets is 


2? (1 — x)! dx ji 2?(1—a)%dz. (1) 


We might notice, in passing, that the expected binomial coefficients that 
would be here were the order in which the tickets were drawn not of impor- 
tance, will in fact cancel out in this latter expression. Moreover, although 
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x is rational, we may assume that the integrand is appropriately extended 
to the whole of [0,1] so that the denominator of this expression is well 
defined}3. 


Using essentially the result (expressed in a modern notation) 


Pr[A| BJ) =) 7 Pr[A|B&C)Pri[c; |B), 


Laplace deduces from (1) that the required probability is 


1 1 
/ eh tl(1 ~ 2)! iz / | 2?(1— 2)! dz , 
0 0 


an expression that is shown (by repeated integrations by parts) to reduce 
to (p+1)/(p+q+4+ 2). This result is immediately extended to obtain the 
probability of drawing m white and n black tickets, viz. 


1 1 
i gPt™ (1 — elit” dx /| z?(1— 2x)! dz 
0 0 


(q+ 1)(¢+2)...(g+n)(pt+I(pt2)...(p+qt]) (2) 
(pD+m+1)\(p+m+2)...(p+q+mt+n+]) — 


(Once again, if no account is taken of the order in which the (m+n) sub- 
sequent tickets are drawn, this expression should be multiplied by ao) .) 
For ease of future reference, let us denote the ratio (2) by Q(p,q;m, 7). 
Supposing p and q to be very large, and m and n very small in comparison 
with p and q, Laplace shows that this latter probability is approximately 


pq” /(p+ prt? 


He then goes on to point out the inadequacy of this approximation for 
larger values of m and n; indeed, if m = p and n = gq, the probability 
should be approximated by 


Vion Jos are 


Laplace next points out that the solution of this problem provides a di- 
rect method of determining the probability of future events after (“d’aprés” ) 
those that have already occurred, but proposes to limit himself to a proof 
of the following theorem: 


On peut supposer les nombres p et q tellement grands, qu’1l 
devienne aussi approchant que l’on voudra de la certitude que 
le rapport du nombre de billets blancs au nombre total des 
billets renfermés dans l’urne est compris entre les deux limites 
p/(p+4q)—w et p/(p+¢)+, w pouvant étre supposé moindre 
qu’aucune grandeur donnée. [p. 33] 
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Using the preceding results, Laplace concludes almost immediately that 
the probability of the desired ratio’s lying between the specified limits is 


[era-atae | [ a-2ytae, 


the integral in the numerator being taken over the region bounded by the 
limits p/(p+q)—w and p/(p+q)+w. By what Todhunter [1865, art. 871] 
calls “a rude process of approximation”, Laplace shows that, for p and q 
infinitely large, and w infinitely less than (p + q)~1/3 and infinitely greater 
than (p + q)~1/?, this probability becomes, approximately!*, 


(p+q)3/* f* 
/2T pg 0 


which he goes on to say is approximately 1: 


Ee Qe~(pta)"2"/2p9 gy (3) 


on voit donc qu’en négligeant les quantités infiniment petites, 
nous pouvons regarder comme certain que le rapport du nom- 
bre des billets blancs au nombre total des billets est compris 
entre les limits p/(p+q)+w et p/(p+q) —w, w étant égal a 
(p+ qe n étant plus grand que 2 et moindre que 3, et a 
plus forte raison n étant plus grand que 3; partant w peut étre 
supposé moindre qu’aucune grandeur donnée. [p. 36] 


He then discusses the error incurred in setting # = 1, concluding in fact 
[p. 39] that 


_ P q 
pete ave vei (1+ 2440) (1-2*2.) 
w/2n(p + q)3/? p q 


(HA EY] a 


In his fourth article Laplace applies his general principle to what Tod- 
hunter [1865, art. 872] calls “the Problem of Points”, i.e. two players, A 
and B, of unknown skills, play a game (e.g. piquet) under the condition 
that the first to win n points or matches (“parties”) will win a sum a, laid 
down at the outset of the game. Suppose now that the players are forced 
to abandon the game at a stage at which A needs f matches and B needs 
h matches to win: how should the amount a be divided between the two 
players!*? 

To solve this, Laplace first states that, were the respective skills of A 
and B known, and in the ratio of p to q respectively (where p+q = 1), the 
amount that B should receive is 


174 7 Laplace 


p? (f+h—I(f+h—2) 
q? 1.2 


sein ee eee 
ees Or ee oa a 


(This result is stated to have been proved “dans plusiers Ouvrages” , includ- 
ing one of his own earlier memoirs of 1773.) Following Todhunter [1865, art. 
873], let us denote this amount by y(p, f,h). 

Once again Laplace cavalierly concludes that ignorance (this time of the 
players’ skills) should be reflected in the choice of a uniform distribution, 
his exact words being 


agi th} f + Ae +h—1)+ 


puisque la probabilité de A pour gagner une partie est inconnue, 
nous pouvons la supposer un des nombres quelconques, compris 
depuis 0 jusqu’a 1. [p. 40] 


Let us represent this unknown probability by x; then the probability that, 
in 2n — f —h matches, A and B will win n — f and n — Ah respectively is 


e-Fp—a)r—* 


Hence, by his fundamental principle, “la probabilité de la supposition que 
nous avons faite pour 2” is 


1 
2"-F(1—2)"-* de /| a™-F(1 — 2)?" de. 
0 


Now the amount B ought to receive is y(z, f,h) when z is the probability 
that A wins a match, and hence the amount B ought to receive is 


1 1 
i, 2 "-F(1 — 2)" "g(a, f,h) dz /| 2"™-F(1—2)"-" de. 
0 0 


This expression is then evaluated. 

In the fifth article Laplace applies his preceding results to the theory of 
errors: this is the first of Laplace’s works on this important topic!®, the 
problem posed here being the following: 


Probléme III — Déterminer le milieu que |’on doit prendre entre 
trois observations données d’un méme phénomene. [p. 42] 


As a consequence of this restriction to three values, Todhunter [1865, art. 
875] somewhat harshly concludes “Thus the investigation cannot be said to 
have any practical value”: however, when one appreciates the complexity 
of the solution, one cannot but admire Laplace. 

Laplace takes as the density of the errors of observations the function 
y = g(x), a function that he supposes, firstly, to be even, to decrease 
asymptotically to zero as x —+ +00 or x — —oo, and to have unit area. 
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FIGURE 7.1. Instants at which an astronomical event is recorded. 


Let a,b,c be points on the line segment AB (see Figure 7.1) representing 
the instants at which a certain astronomical event has been recorded. Let 
p and q be the time (in seconds) between a and b and b and c respectively. 
Then 


on demande a quel point V de la droite AB on doit fixer le 
mileu que l’on doit prendre entre les trois observations a, 6 et c. 


[p. 42] 


If v is “le véritable instant du phénoméne”, at a distance x from a, the 
probability of realizing the given sequence of observations is!’ 


y = f(x) = 9(z) op(p- 2) p(pt+q-2), (5) 


with a similar result (with 2’ replacing x) for any other v’. By the first part 
of the fundamental principle, the probabilities of the two hypotheses are in 
the ratio 


o(x«) p(p— 2) o(p+q—2): 9(2') o(p—-z')p(p+q—2'). 


(The more modern approach would be to take p(x — v) as the density of 
deviations from v, to replace the above ratio (for the observations 21, 9 
and x3) by 


p(x1 — v) p(w2 — v) plea — v) : pla1 — v') p22 — v!) plea - v'), 


and to use maximum likelihood estimation. For further discussion of this 
point see Sheynin [1977, p. 3] and Stigler [1986a, pp. 105-109].) 

Wishing to find the mean, Laplace points out that one may intend one 
of two things?®: 


La premiere est l’instant tel qu’il soit également probable que 
le véritable instant du phénoméne tombe avant ou apres: on 
pourrait appeler cet instant milieu de probabiliteé. 

La seconde est ]’instant tel qu’en le prenant pour milieu, la 
somme des erreurs a craindre, multipliées par leur probabilité, 
soit un minimum, on pourrait l’appeler milieu d’erreur ou mi- 
lieu astronomique. [p. 44] 
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FIGURE 7.2. Illustration showing that the posterior median minimizes the pos- 
terior expected error. 


He next shows that the results obtained under these two conditions are 
equivalent. The first choice clearly leads to the median of (the posterior) 
f(z) as defined above (a uniform prior is tacitly assumed). To find the 
second mean, it is necessary to choose a point v (see Figure 7.2) such that 


/ |z — v| f(z) dz = minimum. 


This choice in fact yields exactly the same choice of mean as the first, 
a fact that can be observed on the following wise. Let u — v = dz, and 
let Q (respectively P) be the centre of gravity of the mass (“partie”) uoL 
(respectively vOH), with abscissa a distance z (respectively z’) from Ov 
(or ou). Let M and WN be the respective masses. 

The sum of the ordinates multiplied by their distances from the chosen 
point v is then 


Mz+Nz'+(ydzr) (F dz). 


Similarly, taking u for this mean, we obtain 
M(z—dxr)+ N (2+ dz) + (y dz) 3) 


The second criterion implying that the difference between these two ex- 
pressions should be zero, we find that M = N — that is, Ov divides the 
area under the curve into two equal parts. 

Sheynin [1977, p. 4] in fact states that the requirement of the second 
criterion is that 


F()= [|e vlsle) ae 


should be minimized. It follows from this that, for any a, @ such that 
—-wo<a<v<B< ow, 


Cee a [i aterndes [aterndes [eae] 
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where g(x, v) = |z — v| f(x). Thus 


| teae- [ twa. 


or v is the median of f(-): 


On voit donc que le milieu astronomigque ne differe point de 
celui de probabilité, et que l’un et |’autre se déterminent par 
Vordonnée OV qui divise l’aire de la courbe HOL en deux par- 
ties égales. [p. 45] 


(Laplace’s V is our v.) 

Fixing his attention on the line Ov, which divides the area under the 
curve into halves, Laplace points out that the finding of this ordinate re- 
quires knowledge of v(x), and adduces arguments that lead to?® 


™ Home 
p(t) = tem (6) 
(This, of course, is really derived under the assumption that z > 0: in 
general the form 
(x)= —e M1, coca <co (7) 
would result.) Using this (further details may be found in Sheynin [1977]), 
Laplace shows that the area S under the curve is given by 


j 1 | 
ce ere (: ~ ried ~- so) 


(though in fact this seems to give only half the total area), and hence z, 
the abscissa of v, is found to be 


= 1 a ee 
c= p+—in (1436 a ) 

For small values of m, z ® (2p + q)/3 (i.e. the arithmetic mean). Further 
discussion of this point may be found in Sheynin (op. cit.) and Stigler 
[1986a, p. 112], and we need say nothing more about it here. 

What is, however, more germane to our present investigation is the case 
in which the parameter m is unknown. In this connexion Laplace writes 
[pp. 48-49] 


D’apres le principe fondamental de |’Article II, les probabilités 
des différentes valeurs de m sont entre elles comme les proba- 
bilités que, ces valeurs ayant lieu, les trois observations auront 
les distances respectives qu’elles ont entre elles. Or les proba- 
biltés que les trois observations a,b et c... s’éloigneront les 
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unes des autres aux distances p et q sont entre elles comme les 
aires des courbes HOL, correspondantes aux différentes valeurs 
de m, comme il est facile de s’en assurer. D’ou il résulte, par le 
principe de |’Article II, que la probabilité de m est proportion- 
nelle a 


mze7™et9) 1 ie 1 o-mp pm 1 -mg dm F 
3 3 
To prove this assertion it is necessary firstly to recall expression (5), viz. 
y = f(x) = o(x) op(p— 2) p(p+q-2). 


Sheynin [1977] chooses to interpret f as the conditional probability density 
function f(xz,m | p,q) where, using (5) and (7), 


3 
f(z,m|p,9 = Tenmllelt pelt pta—al), eee 


It then follows from the formula of total probability (for the continuous 
case) that, in Sheynin’s terminology, 


Prim] =e | f(z,m|p,q)dz 


and, as Laplace noted, Pr [m = 0) = 0. 
The argument in Stigler [1986a, pp. 112-113] runs as follows: interpreting 
f in (5) as f(z, p,q | m), one has 


f(p,q|m) =i, f(z,p,q|m)dz. 
Thus, by the Principle, 


Fm | pa) x f f(z,p,q| m)dz. 


Notice that this latter integral can be written as 


f(p,q) [°° : 
Fm) | femieae 


and compare this expression with that given by Sheynin. 
Still assuming m to be unknown, Laplace now turns his attention to the 
determination of the “best” z: 


si l'on nomme y la probabilité, correspondante a m, que le 
véritable instant du phénomeéne tombe a la distance x du point 
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a, la probabilite entiere que cet instant tombera a cette distance 
sera proportionnelle a 


[umdermorn ¢ 2 ae 2. so) dm , 


Vintégrale étant prise de maniere qu’elle commence lorsque m = 
0, et finisse lorsque m = oo; si donc on construit sur l’axe AB 
une nouvelle courbe H'K L’ dont les ordonnées soient propor- 
tionnelles a cette quantité, l’ordonée KQ qui divisera |’aire de 
cette courbe en deux parties égales coupera |’axe au point que 
l’on doit prendre pour milieu entre les trois observations. [p. 49] 


Laplace’s y seems to be f(x | p,q,m), and the integral in the above 
quotation is then 


/ fe |1p,4¢,m)F(n |p, gam i f(2,m | p,q)dm 


= f(x |p,q). (8) 


It thus follows, according to Laplace, that the posterior median (4, say) 
may be found by solving 


8 — 


0 


ee 
3 -— 2 


f= |p.a.m)f(m | p,g) dmde. 
0 


Using (8), this becomes 


[ teleade=5 [fel padds 


which is indeed true. 
However, Laplace goes on to say 


L’aire de cette nouvelle courbe sera évidemment proportionnelle 
a lintégrale du produit de |’aire de la courbe HOL par 


me Mer) (1 _ =e _ sem) dm. 


Donc, puisque, pour determiner x dans une supposition partic- 
uliére pour m, on a 


me mpt4I-2) — mre7 Pty (1 + sem oe se) 
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= [ ntermentin (1 + qo = sem) (: = ao sa so) dm F 


en intégrant de maniere que les intégrales commencent lorsque 
m = 0, et finissent lorsque m = oo. [p. 49] 


(See Figure 7.3.) The argument now seems to be that?° 


is proportional to 


[ [- f(z, p,q |m)f(m | p,q) dx dm. 


Since f(z | p,qg,m) = f(z,p,q | m)/f(p,q | m), the first of these double 
integrals in fact becomes 


_ a f(%,p,4 | m)f(m |p, ¢)/f(p, ¢ | m) dm dz , 


and it is immediately clear that Laplace’s proportionality “constant” is in 
fact a function of m. Thus the statement at the start of the preceding quo- 
tation is false, and so therefore is the following statement, viz. since, when 
m is known, p is given by solving 


Hl 1 fe” 
[ fleraimde=5 f sepalmyde, 
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it follows that, in this case, 


go 


[ f@paims \f(m | p,q) dm dz 
0 


1 
21 ff se.2.0\myfim|pa)amas. 


—oo 0 


We shall find later that the confusion engendered by Laplace’s cavalier 
treatment of conditional probability is not limited to this memoir. Indeed, 
his lack of a precise notion of conditional probability contributes largely to 
the difficulty of reading much of his work. 

In the present case reba obtains an eayeaae of fifteenth degree for p: 


A TA — T 


(83p+2¢—p)> 3(4p+2g—p)? 3838p + 3q—- p)? 
l 1 1 1 > 4 


Tau) kk OD OLAS. OA NS OA OAS 


~ (Qp+2q)>  3(2p+3q)>  9(4p+2q)® 9(4p + 2)? e 9(2p + 4q)° 


He shows further that this equation has exactly one root in the open in- 
terval (0,p), and also discusses an iterative method for finding it. Stigler 
[1986a, p. 116], by considering the corrected equation”! 


[/ [i tepatmamae = 3 [™ [° ie.ra\ mamas 


[ me~™(2Pt+I-4) dm = i; m?e~™(Pt4) (: + 1 —mp _ so) dm 
0 0 3 3 


obtains the cubic equation 


(2p +q—p)* = [(p+ 9)? + (2p + 4)72/3 — (vp +. 24)79/3] 


whose roots in fact turn out to be even further from the corrections giving 
the arithmetic mean than do Laplace’s. 

Further comment on this problem may be found in Barnard [1988]. Here 
it is supposed that the time ps of a given event is to be estimated from three 
observations 21,22 and x3. Writing the errors of observation as 


pi = (a; — ps) /o, 


Barnard transforms Laplace’s joint density of the p; to 


y(p) = be~(Palt+lal+ps|) 
8 


(cf. our earlier f(z,m | p, q)). 
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Turning to Laplace’s problem of finding that function g(z1, #2, x3) which 
is such that the true value yp is as likely to fall short of g as to exceed it, 
Barnard notes that Laplace essentially assumes the joint prior 


y(p)dpdude , 


that is, a uniform prior density element for and a. If one wishes to allow 
an arbitrary prior for these parameters, one should rather consider 


Y(p)t(u, 7) dpdude . 


The value of g obtained by Laplace is seen to be found in this case by 
taking a(us,0) « 1/o — ie. the Jeffreys non-informative prior — rather 
than using the uniform prior adopted by Laplace. 

At the start of the sixth article Laplace poses the following problem??: 


je suppose que A joue avec B 4a croix ou pile, a ces conditions: 
savoir que, s1 A amene crolx au premier coup, B lui donnera 
deux écus; qu’il lui en donnera quatre s’il ne l’amene qu’au 
second, huit s’il ne l’améne qu’au troisieme, et ainsi de suite 
jusqu’au nombre z de coups. [pp. 53-54] 


In solving this problem Laplace supposes initially that the probability of a 
cross (i.e. a “head”) is (1+w)/2. Then A’s expectation is 


(ltw)[1+(1-w)+(1—w)? +--+ (1—w)*7}] 
=(1+w)[{1—(1—w)"] Jw. 


A similar expression, mutatis mutandis, is given for the case in which the 
probability of a cross is (1 — w)/2. Now, says Laplace, as the probability 
(1+w)/2 is as naturally attributed to cross as to pile (i.e. a “tail”), the 
expectation / of A is to be taken as 


E=14+(1-w?) [(1+w)?7* —-(1—w)?77] /2u , (9) 


2 


which reduces, for w so small that powers of w higher than w* may be 


neglected, to 
(x — 1)(e — 2)(# — 3) 2 
foes 4 —~(r—- 1}. 
aa 123 ees} 

If one supposes that w may take on equally any one of the values in the 
interval (0,1/q), one finds A’s total expectation by multiplying (9) by q 
and integrating. 

The remainder of the memoir is irrelevant to our purposes. However, 
before finishing off this discussion, let us note Laplace’s remarks on the 
choice of a uniform prior?*: he writes 
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On suppose dans la théorie [i.e. des probabilités] que les différents 
cas qui amenent un événement sont également probables, ou, 
s’ils ne le sont pas, que leur probabilité est dans un rapport 
donné. Quand on veut ensuite faire usage de cette théorie, on re- 
garde deux événements comme également probables, lorsqu’on 
ne voit aucune raison qui rende |]’un plus probable que |’autre, 
parce que, quand bien méme il y aurait une inégale possibilité 
entre eux, comme nous ignorons de quel coté est la plus grande, 
cette incertitude nous fait regarder l’un comme aussi probable 
que l’autre. 

Lorsqu’il n’est question que de probabilités simples, il parait que 
cette inégalité de probabilités ne nuit en rien a la justesse de 
application du calcul aux objets physiques ... mais, lorsqu’il 
s’agit de probabilité composée, il me semble que |’application 
que l’on fait de la théorie aux événements physiques demande 
a étre modifiée. [p. 61] 


7.4 Sur Vintégration des équations 


différentielles 
The title of this memoir**, viz. “Recherches sur l’intégration des équations 
différentielles aux différences finies et sur leur usage dans la théorie des 
hasards”, is just right, and the actual contents do not concern us here. 
It is, however, of interest to note the general remarks in the twenty-fifth 
article (the first section of the memoir in which probabilistic matters are 
broached), for it is here that we find a clear exposition of the distinction 
Laplace makes between “hasard” and “probabilité” (as well as a discussion 
of moral vs mathematical expectation)?°: 


Nous regardons une chose comme |’effet du hasard, lorsqu’elle 
n’offre 4 nos yeux rien de régulier, ou qui annonce un dessein, 
et que nous ignorans d’ailleurs les causes qui l’ont produite. 
Le hasard n’a donc aucune réalité en lui-méme; ce n’est qu’un 
terme propre a désigner notre ignorance sur la maniére dont les 
différentes parties d’un phénomene se coordonnent entre elles 
et avec le reste de la Nature. 

La notion de probabilité tient 4 cette ignorance. Si nous sommes 
assurés que, sur deux événements qui ne peuvent exister en- 
semble, l’un ou l’autre doit nécessairement arriver, et que nous 
ne voyons aucune raison pour laquelle l’un arriverait plutot 
que |’autre, l’existence et la non-existence de chacun d’eux est 
également probable. [p. 145] 


This is followed by an extension to three events. 
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A clear statement follows of the conditions under which probability is to 
be defined as the ratio of the number of favourable cases to the number of 
possible cases, viz. 


la probabilité de existence d’un événement n’est ainsi que le 
rapport du nombre des cas favorables a celui de tous les cas 
possibles, lorsque nous ne voyons d’ailleurs aucune raison pour 
laquelle l’un de ces cas arriverait plutot que |’autre. Elle peut 
étre conséquemment représentée par une fraction dont le numé- 
rateur est le nombre des cas favorables, et le dénominateur celui 
de tous les cas possibles. [p. 146] 


As Hacking [1975, p. 131] has noted, the word “possibilité” does not occur 
in this definition: it is, however, used on p. 149 with almost the sense of a 
physical probability. 

Laplace next gives a precise definition of the purpose of the theory of 
chances, 1.e. 


la théorie des hasards a pour objet de déterminer ces frac- 
tions |i.e. fractions de la certitude], et l’on voit par la que 
c’est le supplément le plus heureux que |’on puisse 1maginer 
a lincertitude de nos connaissances. [p. 146] 


As in the previous memoir, Laplace here draws a distinction between in- 
stances in which the causes are known but the events are to be determined, 
and those in which the events are known but the causes are unknown. The 
latter instances formed the subject of the previous memoir: the probabilis- 
tic parts of the present one are devoted to the former, their discussion being 
in terms of the finite difference methods introduced in the first twenty-four 
articles of the memoir. 


7.5 Recherches sur le milieu 


This memoir, whose title in full is Recherches sur le milieu qu’tl faut chotsir 
entre les résultats de plusieurs observations, was read before the Académie 
royale des Sciences, Paris in 1777, and remained unpublished?° until 1979. 
Here, in some sense in opposition to Lagrange [1770-1773], Laplace consid- 
ers the application of inverse probability to the determination of the mean 
of a number of observations. 

Having noted Lagrange’s work on the error to be feared in the taking of 
the arithmetic mean of the results of several observations, Laplace states?’ 


Le probléme dont i] s’agit peut étre envisagé sous deux points de 
viie différents suivant que l’on consideére les observations avant 
ou aprés qu’elles sont faites; dans le premier cas, la recherche 
du milieu qu’il faut choisir entre les observations, consiste a 
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déterminer a priori la fonction des résultats des observations 
qu’il est le plus avantageux de prendre pour résultat moyen; 
dans le second cas, la recherche de ce milieu consiste a déterminer 
une fonction semblable a posteriori, c’est a dire en ayant egard 
aux distances respectives des observations entre elles. On voit 
facilement que ces deux manieres d’envisager le probléme doivent 
conduire a des résultats différents; mais il est visible en méme 
tems que la seconde est la seule qui doive étre employee. 

[p. 229] 


Noting that a number of different things may be meant by “le mzlzeu ou 
résultat moyen” of a number of observations?®, Laplace devotes §§II-VI of 
his memoir to the case in which the law of facility of the error is known 
(possibly different laws for each observation), turning in §VII to the case in 
which it is unknown. Before turning to this section, however, we note that 
Laplace again gives here the general principle he had given before, viz. 


Si un événement peut étre produit par un nombre, n, de causes 
ou de suppositions différentes, les probabilités de l’existence de 
ces causes prises de l’événement, sont entre elles comme les 
probabilités que ces causes ayant lieu, l’événement aura lieu 
pareillement, et la probabilité de l’existance de chacune d’elles 
est égale a la probabilité de l’événement prise de cette cause, 
divisé par la somme des probabilités de l’événement prises de 
chacune de ces causes. [p. 241] 


Laplace notes at the start of his seventh section that the most usual case 
is that in which the law of facility of the errors of observation is unknown, 
and suggests that the most natural thing to do is to choose a law that re- 
flects the following two criteria: (a) positive and negative errors are equally 
likely, and (b) the facility (of the absolute values) of the errors decreases 
as (the magnitudes of) the errors increase. There are, of course, an infinite 
number of such possible laws, each leading to a (different) mean. The prob- 
lem of taking the mean of all these means is completely new, and calls for 
particular cunning. 

Let, then, the errors all fall in the interval [—A, A], and let a and aim 1) 
respectively denote the smallest and largest observations taken. Two points 
M and N are then determined by aM = h and a-)N = h. Suppose fur- 
ther that the laws of facility of the errors of the observations are (possibly, 
though not necessarily) different. A “courbe des probabilités” (MRN) of 
the “véritable instant” (V, say) is then constructed (this is a posterior dis- 
tribution) and, under the assumption that the chosen system ($1, say) of 
laws in fact obtains, 


la probabilité que le point P est le véritable instant du phénom- 
ene, est égale a l’ordonnée PR, divisée par |’aire entiere MRN 
[p. 242], 
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FIGURE 7.4. General (MRN) and uniform (ZTV) posterior distributions for the 
determination of the mean. 


which one could write as 
Pr[P = V|S\] = (PR)/area(MRN) . (10) 


Noting that area(MRN) = Pr[S,], Laplace says that (10) can be written 
as 

(PR) = Pr/P = V|S,] Pr[S4] . 
Repetition of this exercise for each possible system S;, 7 € {1,2,...,K}, 
(with corresponding distinguished ordinate (PR),;, say) then results, it is 
claimed, in 


1 
a DPR) =Pr[P=V], 


the actual words being 


la somme de toutes les ordonnées PR, correspondantes 4 chaque 
sistéme, divisée par le nombre des sistémes, pourra réprésenter 
la probabilité que le point P est le véritable instant du phénom- 
éne. [p. 243] 


Now the above symbolizing of Laplace’s verbal argument is at best specious; 
one may proceed more accurately as follows: let MR;N denote the curve 
corresponding to the ith system, and let r; denote the ordinate PR;. Then 


Prl[P <V < P+dP|S;] =r; dP. (11) 


It should, however, be noted that (11) differs from (10) in not having the 
divisor “area(MRN)” on the right-hand side. 

Assuming that the same probability element is used for each of the Kk 
systems, and that each system is equally probable, one has 


i 1 
K (> "| dP Re d Pri[P <V < P+dP|S;] 
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FIGURE 7.5. Choice of points from a law of facility. 


So Pr[P < V < P+ dP|S;] Pr[S)) 


= Pr{P<V<P+dP], 


which is Laplace’s result. 

It follows that if }>r; is known for every abscissa P on MN, then a 
new “courbe des probabilités” may be formed, one that can be seen as the 
true curve, while the position of the desired mean will be determined by 
the ordinate xy that divides the area under this true curve in half. The 
question is thus reduced to the determination of the r;. 

To this end, consider a specific system S; with distinguished ordinate r; 
at P. Then r; is equal to the product of the probability that the error in 
the first observation is f = —2x, with the probability that the error in the 
second observation is g@) — z, with the probability that the error in the 
third observation is g(?) — z, etc. (This is discussed in the sixth section of 
the memoir, and it in fact leads to the construction of the curve MRN.) 
The sum of all these products is 5°, 7;, a sum that can also be expressed as 
the product of (a) the sums of all the probabilities that the error in the first 
observation is —z, with (b) the sums of all the probabilities that the error 
in the second observation is g@) — x, etc.29. The problem is thus reduced to 
the finding of the sum of the probabilities that the error in any observation 
whatsoever is f. 

Thus let y(f) be the law of facility of the errors of the observation, and 
construct the curve MHWN so that if AP = f, the ordinate PL = y(f). 
Now AM = AN = fh, and so AH divides the area under the curve into 
two completely similar parts. Any ordinate PL then represents the prob- 
ability of the corresponding error AP, while HAN = 1/2. Suppose now 
that 1/2 is divided into an infinite number m of infinitely small masses 
(“parties”), each ordinate, like PZ, containing an infinite number of these 
masses. ‘These m masses are distributed in all possible ways over the points 
of AN, in such a way that for each combination, the ordinates PL contain 
the greater number of masses, however much less than AP they may be. 
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Then the probability of the error AP will be the sum of the ordinates PL 
corresponding to each combination, divided by the total number of combi- 
nations. Note that it is only necessary to consider as different combinations 
those that give one or more different ordinates. 

Consider the straight line AN on which a finite number h of points are 
marked, and suppose that a “nombre infini” m of masses are to be dis- 
tributed over these points®°. If there is only one point A on the line the 
number of arrangements (“combinaisons”) is 1. If there are two points A 
and B, any number of masses in {1,2,...,m/2} may be put above B, the 
number of arrangements then being m/2. In the case of three points A, B 
and C’, let z be the number of masses put above C’. Then there must be at 
least z masses over each of A and B, and so (m — 3z) masses remain to be 
distributed over A and B. The number of arrangements, by the preceding 
case, is then (m— 3z)/2, and the number of different arrangements is then 


ie (m — 3z) Ate m? 


Proceeding in this way one finds that the number of arrangements when h 
points are chosen on AN is 


m1 


(h—D!Ph 


Supposing next that the number of masses that can appear above A 
may not exceed p, Laplace shows that ['(h,m, p), the number of possible 
arrangements, 1s 


the term in crotchets terminating when one of the terms (m-— p), (m— 2p), 
etc. becomes negative. 

Next (see Figure 7.5) suppose that AN again contains a finite number 
h of points while PL contains p masses. Let n be the number of points 
preceding P, each of which will contain at least p masses — say np + pu 
altogether. Then m — p(n +1) — w masses lie on the other side of PL, and 
are spread over h — (n + 1) points. The number of arrangements of the 
masses over the points that precede P is, as one has seen before, 

n—1 


jaa 0) 


while the number of arrangements of the remaining m—p(n+1)— masses 
over the h — n — 1 points that succeed P is 


T(h-n—-—1,m—p(n+1)—4p,p). (13) 
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The number of arrangements that give ordinate PL = p at P is then found 
by multiplying (12) and (13) together and integrating, the integral being 


| a. py (" ; i ‘(1m —p(n+1+i)—p)""" dy , 


where D = [(n — 1)!]? n[(h — n — 2)!]? (h — n — 1) and the upper limits of 
integration are m — p(n+1), m— p(n + 2), etc. for the first, second, etc. 
terms in the sum. This integration reduces the above expression to 


1 h-n-1 : ee 
TOAD SS ( PT) Mere tie aye. 


The sum of all ordinates at P is then found by multiplying this last expres- 
sion by p and integrating the successive terms from p = 0 top= m/(n+1), 
p=m/(n-+ 2), etc., a process that yields for this sum the value 


h 


m h-n—-1l ; 1 
se i Jen (n+1+i)2 | 


Laplace shows that this may also be written in the form 


Li ee re 
(h—l 2h? [ntl nt2 °°. Al’ 


division of which by m"~1/{(h —1)!]*h, the number of all possible arrange- 
ments, yields the probability that the error of the observation is AP as 


m 1 1 1 

| ot], (14) 
m i 1 1 
he tac te tt oo) 


Now n and A are supposed to contain an infinite number of points, so 
that 1/h is infinitely small. If we let n+ 1 or AP = a, and x/h = z, then 
dz = 1/h and (15) becomes 


oe OF eee ape dz bs +d 
h2dz | z zgt+tdz 2+4+2dz a ih 


The term in braces above can be written as 


(aye 


z+nk’ 


n=O 
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FIGURE 7.6. Observations marking the instant of a phenomenon. 


where k = dz. Using an integral approximation one has 


(1-2)/k k (1-z)/k 2 
zt+nk ~ / pT 


n=0 
which is in fact Laplace’s evaluation of the series. Thus we may take the 
probability of the error as Inz~+ = In(h/z) (or as anything proportional 
to it), a value representing the facility of a positive error z as well as of a 
negative error —2. 

Suppose now that there are n observations determining the instant at 
which a phenomenon occurs, these occurring at the times a,a’,a”,... with 
axa <a’ <.... Let am =e and mn = y in Figure 7.6. The curve is 
drawn in such a way that 


y = In(h/z) -In(h/(q®) + 2) -In(h/(q + 2)) - ete. , 
this curve being asymptotic to ai. Similarly the curve Qn R, asymptotic to 
ai and a’2’, is drawn using 
y = In(h/z) - In(h/(q? — «)) - In(h/(q — «)) - ete. , 
while SnZ is drawn from 
y = In(h/x) -In(h/(@ = q)) -In(h/(q’?) = 2)) -In(h/(q®) ~ 2) - ete. 
The last curve, VnJN, is drawn from 
y =In(h/x) -In(h/(w = g)) -In(h/(@ — g)) -In(h/(w = q)) - ete. 


The point X, which is the mean one wishes to ascertain, is then determined 
by noting that the ordinate XT divides the area MPVWN into two equal 
parts. Laplace notes somewhat sadly that?! 
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dans |’état actuel de l’analyse, la quadrature de ces courbes est 
impossible. [p. 251] 


In the eighth section Laplace notes that since A is infinite and z finite, 
we may replace In(h/z) by Inh, so that in each of the curves QnFR, SnZ, 
etc. the ordinates may be regarded as constant and equal to (Inh)” (but 
infinite at a, a’ etc.). As regards the extreme curves PnM and VnN, he 
shows that, provided we assume that 0 < x < K, where K is finite but 
extremely large with respect to q@), g?), etc., the area under PnM is 


n-l 
K(Inh)" + A—4(Inhy* S~ g® , 
2 


1 
while that under VnWN is 
K (Inh)" +A—+(n hy" [(q@-) —g-) + (q@@*—Y —g*-9)) pete. +g" Y] 
where 


h 
A= | (nh —Inaz)" dz. 
K 


If z denotes the distance aX from a to X, where XT divides the area 
M PVN into two equal parts, then the area of M PTX is 


n-l 
1 ; 
Ink)” + K(inh)” + A— —(Inh)" (2) 
x(nh)" + K(nhy +A—Z(nh)" Ye 
while that of TX NV is 
(q@-) — z)(Inh)" + K(Inh)" +A 


1 _~ n- uw 
= = (In h)*[(q*-)) — q(?~?)) + (q Deg) pete gery 


and on equating these two one finds that 


LS 
Oe 


“qui donne le méme résultat que la méthode des milieux arithmeétiques” 
[p. 253]. 


To conclude this section Laplace notes that 


En général on arrivera a ce résultat, toutes les fois 1°. que la loi 
de facilité des erreurs sera la méme pour toutes les observations; 
2°. que les erreurs en — seront aussi faciles que les erreurs en +; 
3°. qu’elles pourront étre infinies, et que la fonction qui exprime 
leur facilité ne décroitra d’une quantité finie que lorsque |’erreur 
sera supposée infinie, en sorte qu’alors elle aille décroissante a 
Vinfini. [pp. 253-254] 
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To prove this statement he considers a general law of facility ®(azx) for 
which ®(ax) = p when z = 0, it being supposed that a is infinitely small. 
Proceeding as in the previous special case he deduces that 


Lo 
ae Oe 


and then notes that 


Les suppositions sur lesquelles le résultat est fondé ne pouvant 
avoir lieu, il est clair que la méthode des milieux arithmétiques 
est contraire aux regles des probabilités, et qu’ainsi dans des 
cas extremement délicats, il faut faire usage des recherches pré- 
cédentes. [p. 255] 


In the ninth and final section Laplace considers the case in which the 
quantity sought is not immediately given by the observations — say that 
one has observed n phenomena from which p unknowns have to be deter- 
mined, with n > p. The theory runs much along the lines discussed before, 
and we shall content ourselves here with noting Laplace’s last paragraph: 


Voila ce me semble, tout ce que peut fournir la théorie des 
hazards sur la détermination des milieux qu’il faut prendre 
entre les résultats de plusiers observations, malheureusement 
analyse dans |’état d’imperfection ou elle est encore, se refuse 
aux opérations qu’exige cette méthode, et si l’on vouloit en faire 
usage il faudroit recourir a des approximations tres pénibles; 
mais 11 n’en est pas moins intéressant de savoir jusques ou peut 
nous conduire dans ces matiéres une Juste application du calcul 
des probabilités. [p. 256] 


7.6 Sur les probabilités 


This, the second of Laplace’s works that are particularly germane to our 
present study, appeared in the Histozre de l’Académie royale des Sciences, 
Paris, in the volume for 1778 — although it was submitted on the 19th July 
1780 and was published*? in 1781. Although no mention of Bayes or Price 
is made in this memoir, an anonymous abstract®*, in the same volume, of 
this article does in fact comment on their work. Because this summary is 
not reprinted in the Guvres completes de Laplace, and is perhaps therefore 
not readily accessible, I give the relevant part of it here: 


Toutes les questions du Calcul des Probabilités peuvent se réduire 
a une seule hypothése, a celle d’une certaine quantité de boules 
de différentes couleurs mélées ensemble, dont on suppose qu’on 
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tire au hasard différentes boules dans un certain ordre ou dans 
certaines proportions. $i on suppose connu le nombre de boules 
de chaque espéce, on a le calcul ordinaire des probabilités tel 
que les Géometres du dernier siécle ]’ont considéré: mais si ]’on 
suppose le nombre de boules de chaque espéce inconnu, & que 
par le nombre de boules de chaque espéce qu’on a tirées, on 
veullle juger ou de la proportion du nombre de ces boules, ou 
de la probabilité de les tirer dans la suite suivant certaines loix, 
on a une nouvelle classe de problemes. Ces questions dont il 
paroit que M'’. Bernoulli & Moivre avoient eu l’idée, ont été ex- 
aminées depuis par M's. Bayes & Price; mais ils se sont bornés 
a exposer les principes qui peuvent servir a les résoudre. M. de 
la Place les a considérées avec plus d’étendue, & il y a appliqué 
Vanalyse. [| Hist. Acad. r. des Sciences, Paris 1778, pp. 43-44] 


The memoir begins with a clear statement of the scope of the study*?: 


je me propose de traiter dans ce Mémoire deux points impor- 
tants de l’analyse des hasards qui ne paraissent point avoir 
encore été suffisamment approfondis: le premier a pour objet 
la mamiere de calculer la probabilité des événements composés 
d’événements simples dont on ignore les possibilités respectives; 
l’objet du second est l’influence des événements passés sur la 
probabilité des événements futurs, et la loi suivant laquelle, en 
se développant, ils nous font connaitre les causes qui les onts 
produits. [p. 383] 


These matters “forment une nouvelle branche de la théorie des probabilités” 
[p. 383]. As regards the first point raised in the above quotation, Laplace 
proposes to give a general method for determining the probability of any 
event whatsoever, when only the law of possibility (“loi de possibilité”) of 
the simple events is known, and, should that law be unknown, to determine 
what ought to be done. The second point leads him to the question of births. 
As a generalization of these investigations, he proposes a method that will 
lead to the determination not only of the possibilities * of simple events, 
but also of any future event whatever. 

When one comes to consider the determination of the probability of 
events (following any law) compounded (or composed) of simple events of 
known possibilities, there are, claims Laplace in his second article, three 
ways of effecting this: 


1. a priort, lorsque, par la nature méme des événements, on 
voit qu’ils sont possibles dans un rapport donné; ... 2. a pos- 
tertori, en répétant un grand nombre de fois l’expérience qui 


*T shall thus translate Laplace’s “possibilités” . 
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peut amener |’événement dont il s’agit, et en examinant com- 
bien de fois il est arrivé; 3. enfin, par la considération des motifs 
qui peuvent nous déterminer a prononcer sur |’existence de cet 
événement. [pp. 384-385] 


The differences between these three methods are illustrated by an illumi- 
nating example: suppose that the respective skills of two players A and B 
are unknown. As one has no reason to suppose A more skilful than B, one 
may conclude that the probability of A’s winning a match is 5 The first 
of the above three methods gives the absolute possibility of the events; the 
second makes it approximately known, as will be seen in the sequel, and 
the third gives only their possibility relative to the state of our knowledge. 

The relativeness of all probability to us (or to the state of our knowledge) 
is then emphasized, and Laplace stresses that this does not in fact blur the 
distinction between absolute and relative possibility. 

He next returns to consideration of the problem of the two gamblers, al- 
ready discussed in §7.3 above®®. Assuming that (1+q@)/2 is the probability 
that the more skilful player wins a game, and that there is no reason to 
suppose that A is more skilful than B, Laplace shows that the probability*® 
that A will win the first n matches is 


ee ee re E l—-a\" 
2 2 2 | 
The next three articles are devoted to variations on, and generalizations 
of, this example®’: we turn briefly to the continuation given in the sixth 


article. Supposing that a € [0,q], and representing the probability of a by 
(a), we obtain 


pe [ {{(1 + @)” + (1 — @)"] /2°T"} p(a) da. 


If, for example, y(a) = / ( a constant), “en sorte que toutes les valeurs de 
a soient également possibles” [p. 394], then [” p(w) da = 1 implies! = 1/4, 
and hence 


P= ([(l+9)"** -(—9)"*"] /[(n +1) ¢2"*7], 


which, it should be noted, reduces, for g = 1, to 1/(n + 1) — the same 
result as that obtained in the corollary to Bayes’s eighth proposition®®. 
Commenting, in the twelfth article, on the law of possibility of the skills 
of the players, Laplace points out that this law is able to be known only 
because of a long sequence (“suite”) of observations, in the absence of which 
the most likely functions should be chosen — “l’analyse des hasards, qui 
n’est en elle-méme que |’art d’apprécier les vraisemblances, doit donc nous 


guider dans ce choix” [p. 409]. 
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In his fourteenth article Laplace points out that, while one may very well 
have no reason initially to attribute more ability to any one of the players 
than to the others, new light is gained as to their respective skills as the 
matches continue, which skills would be exactly known were the number of 
games to become infinite. He proposes in this article to consider the effect 
past events exercise on future events”? . 

Denoting by # the past event, by e the future event “dont on propose de 
calculer la probabilité P” [p. 414] (though in fact a conditional rather than 
an absolute probability is found), and by E+ e “un événement composé 
de |’événement F arrivant le premier et de |’événement e arrivant ensuite” 
[pp. 414-415], Laplace shows that 


Prfe| EL} =P=Pr[{F+e]/Pr[£]. 


From this it is but a short step to the idea of independence and the fac- 
torization Pr[£ +e] = Pr[E]Pr[e], which in turn leads naturally to the 
question of the determination of the probability of causes as deduced from 
events*?. 

So much really by way of introduction: it is only now that Laplace starts 
considering that which is actually our topic. For in his fifteenth article he 
turns his attention to the matter introduced at the end of the fourteenth 
(a matter already examined in his Mémoire sur la probabilité des causes 
par les événements). He supposes that an event # can occur in conjunction 
with one and only one of the n causes Aj, A2,..., An (or A, A’,... , AM-YD 
in his notation), and deduces the formula (here given in modern notation) 


Pela] = Prt Al / 3 Pet | A PET anh 706) 


under the following assumptions: for each 7 € {1,2,... , n}, 
Gi) Pr[A;] = 1/n, and 


(ii) events Ey, £2,... similar (“semblable”) to the event EF in question 
are conditionally independent with respect to each A,. 


One recognizes in (16) above, of course, a “discrete Bayes’s Theorem” 
with uniform prior. Notice too that we have here a proof of the “Principe” 
discussed in §7.3 above. (A proof of (16) may be found in Appendix 7.2.) 
The article is concluded with a verbal expression of the final algebraic 
result. 

In his sixteenth article Laplace proceeds to illustrate the results of the 
preceding article by what, on his own admission, is a very simple example. 
Let A and B be two players of unknown skills. It being exceedingly unlikely 
that these skills are perfectly equal, let (1+a)/2 and (1—a)/2 denote the 
greater and the lesser respectively. By Article II the probability that A 
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will win the first two games is P = (1 + a?) /4. If, however, one wants the 
probability that, B having won the first match, A will win the following two, 
it is clear that the preceding value of P is too large. In fact, if one considers 
each skill as a particular cause of the event, the probability that B’s skill 
is (1+ a)/2 will be, by the preceding article, equal to the probability that 
B, having this skill, will win the first game, divided by the sum of the 
probabilities that he will win in having successively the skills (1+«a)/2 and 
(1 — a)/2, a probability that becomes 


(l+a)/2 
(1 + a@)/2+ (1 — a)/2] 


In the notation of Article XIV, with & the winning of the first match by B 
and e the winning of the following two by A, we have V = Pr[#] = (1+a)/2 
or (1 — a)/2 according as the greater or lesser skill is B’s. On taking the 
moiety of the sum of these values (and no further argument for this uniform 
assumption is given) we find that V = 7 Similarly the probability v of H+e 
is [((1—a@)/2][(1+a)/2]? or [((1+-)/2][(1—a)/2]”, and hence v = (1 — a?) /8. 
Thus 


=(1+a)/2. 


P=v/V = (1-a°%) /4. 


The preceding argument is then generalized to the case of finding the 
probability P that, B having won the first match, A will win the next n. 
This is 

P= (1-2?) [(1+a)"7*>+(1—a)?7*] /art? 


which, for small a, reduces approximately to 


Thus far the use of the “discrete Bayes’s rule”. One should note the role 
played by the “equally likely” assumption (though this is often tacit). 

Laplace next, in the seventeenth article, turns his attention to the proba- 
bility of causes as deduced from events. Before doing so, however, he states 
quite clearly that absence of knowledge entails an equiprobable distribution 
of the possibilities: in his own words, 


lorsqu’on n’a aucune donnée a prior: sur la possibilité d’un 
événement, il faut supposer toutes les possibilités, depuis zéro 
jusqu’a Vunité, également probables. [p. 419] 


Todhunter [1865, art. 893] regards this as the same as the principle enun- 
ciated by Laplace in his Mémoire sur la probabilité des causes par les 
événements; but as we have already seen in our discussion of that work, the 
equiprobability assumption is at best tacit there, and the present memoir 
has the first clear statement of this assumption. 
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Laplace now applies this principle to the problem considered in Article 
III of his second memoir of 1774, stating it however in terms of births of 
boys and girls rather than the drawing of white and black tickets from an 
urn. Of p+q children, p are boys and q girls: what is the probability P that 
m+n future births will give rise to m boys and n girls? Now the probability 
that, in p+q births, p will be boys and q girls (the event denoted by F in 
Article XIV) is 

Az?(1— x)! , 


where A = (ey) and x is the probability of the birth of a boy. Similarly, 
the probability that, of the p+ q infants first born, p will be boys and q 
girls, and that of the following m+n births, m will be boys and n girls 
(the compound event denoted by E+ in Article XIV) is 


yraPt™ (1 — 29+” 


where y = (eae | (Note the use of independence and constant probability.) 
Laplace now makes use of his equiprobability assumption. To leave no 
room for doubt, let us consider his exact words: 


maintenant, x étant susceptible de toutes les valeurs depuis 
x = Q jusqu’a x = 1, et toutes ces valeurs étant a priori 
également probables, il faut, pour avoir la véritable probabilité 
de #, multiplier Az? (1— 2x)! par adz, a étant constant, et pren- 
dre l’intégrale 4 { axz?(1 — x)! dx (depuis z = 0 jusqu’a x = 1) 
[p. 420] 


the value of a being determined from i. adz = 1, whence a = 1. Similarly, 
the probability of F +e is 


1 
ay | gPTm(y — git" dz , 
0 


and thus the desired probability P, a probability that is of course condi- 
tional on the p+ q earlier births, is, by Article XIV, 


1 1 
Pog: gPt™ (1 — git iz | f eh (dea)? dec 
0 0 


The remainder of this article is devoted to the evaluation of these inte- 
grals. Laplace shows firstly that 


(gq+1)(¢+2)...(¢+n)(pt+1)(p+2)...(p+m) | 


P= 
(pP+q+2)(p+q+3)...(p+q+m+n-+1) 


(17) 


Noting that 


i 


— i 1 = ce ee eee ee 
log(1.2.3...u) = 5 log2a + (u+ 3) logu—u+ on Seo 


198 7 Laplace 


Laplace suggests the use of the approximation*! 
12.3...u= V2Qruttize 4 | (18) 


If one supposes that p and q are “trés grands nombres” [p. 421] and that, 
approximately, 


(P+q+l/wtatm+nt+l)=(p+q)/(Ptqtmtn), 
substitution of (18) in (17) yields 


BO CD ani ai) el 
—  pPtagita(ptgtm+tn)ptatmtnt 3 


Finally, if m and n are very small in comparison with p and q, the 
approximations 


(p+ m)Pt™+2 we empptmts 
(q+ n)itnta re ergttnts 


(ptq+tm+ nyPtitmtnt > x eM (py qjptatmtnt3 
enable us to write P as 


pg 
| on eee te aa 
"o+qym 


which may perhaps be more suggestively written in the form*? 


m™m n 
po) (=) (—) | 
o pt+q pt+q 

Laplace begins his eighteenth article by pointing out that the probability 
P obtained at the end of the preceding one is that which one would reach 
were one to suppose the possibilities of the births of boys and girls to 
be in the ratio of p to q, from which it is natural to conclude that these 
possibilities are (in fact) very nearly in that ratio, the true possibility of 
the birth of a boy thus being approximately p/(p + q). (One sees here, 
indeed, the shadow of James Bernoulli, in the approximation of a possibility 
by an observed frequency.) This “approximately” is to be interpreted in 
a probabilistic sense — viz. that p/(p+q) and neighbouring values are 
incomparably more probable than others —- again vide Bernoulli. 

This comment permits the reformulation of the preceding conclusion as 
follows: 


si l’on désigne par # une quantité fort petite et par P la prob- 
abilité que la possibilité de la naissance d’un garcon est com- 
prise dans les limites p/(p + q) — @ et p/(p+q) +9, la valeur 
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de P différera d’autant moins de la certitude ou de l’unité que 
p et q seront de plus grands nombres, et |’on peut tellement 
faire croitre p et q que la différence de P a |’unité soit moindre 
qu’aucune grandeur donnée, quelque petit que @ soit d’ailleurs 
[pp. 422-423], 


a result to the proof of which the present article is devoted*?. 

Noting that this result is only true in the limit (“dans l’infini” ), Laplace 
proposes to consider an approximation to P by a series that is rapidly 
convergent. From Article XV, the probability P that x (the possibility 
of the birth of a boy) hes between the two limits 6; and 2, say (where 


G1 < 2), is 
p= fara -aytde | [ara —ayrae 
1 0 


The problem thus reduces to the evaluation of the incomplete beta-integral 

ie zP(1 — x)? dx when p and q are large. To this end, let y = ?(1 — x)!: 
then ; 

ydxr = ak Ee 2) 

p-(p+q)z 


Letting p = 1/a and q = pp/a, where a is a very small fraction, we obtain 


dy . 


dx: 


ydz =azdy or z= —y—, 
a” dy 


where z = z(1 — x)/[1 — (1+ y)z]. Integration by parts yields 


d d d 
[ude = Cr aye aPye F402 fy & (2) det (19) 


where C' is an arbitrary constant. (This expression is from Todhunter [1865, 
art. 895]: Laplace gives it in the more suggestive form 


dz »d(zdz)  _4d[zd(zdz)| 
[vde= Cray: f1-o% +0 TO ogee .) 


Laplace next shows that, for any t < 1/(1+ yp), 


l dz < [ 2 
Y ] ; Y Y 


Cette remarque peut servir lorsque, sans chercher la valeur ex- 
acte de f ydz, on veut s’assurer si elle est plus grande ou plus 
petite qu’un quantité donnée. [p. 425] 
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Laplace next evaluates [ ydx from x = p/(p+q)—8@ to « = p/(p+q) +9 
(or equivalently from 1/(1+ 4) —@ to 1/(1+y)+ 6), which, together with 
the Stirling-de Moivre approximation for n!, yields, on neglecting terms of 
order a®/?, 


7 Jap i [12u? + (1+ 4)2(1 + p+ w)6?] . 
O,/2m(1+4+ pu)? 12u(1 + p)30? 


e ~(1+4+ p)6Pr" ¢ + Ltt)" +f1+(1+ mor?) (: a ng) 


The factor in the last pair of braces being extremely small in the present 
question, Laplace concludes that 


il est visible que l’on peut tellement augmenter p et q, et, par 
conséquent, diminuer a, que cette différence de P a l’unité soit 
moindre qu’aucune grandeur donnée, ce qui est le théoréme dont 
nous avons parlé au commencement de cet article. [p. 429] 


The results of this article are applied in the following one to the question** 
of the apparent excess of male over female births in Paris from 1745 to 
1770. While Laplace states that his concern is to determine “combien 11 est 
probable que les naissances des garcons dans cette grande ville sont plus 
possibles que celles des filles” ({p. 429], it is in fact perhaps worth noting 
that what is actually found [p. 430] is the probability*® that the possibility 
of the birth of a boy is less than or equal to é. This is achieved by taking 
@6 = (1—yp)/2(1 + pw). With p = 251,527 and q = 241,945, the desired 
probability (i.e. that x exceeds 4) is seen to differ from 1 by the fraction 
1.1521/104%. Laplace’s conclusion is 


on peut regarder comme aussi certain qu’aucune autre vérité 
morale, que la différence observée a Paris entre les naissances 
des garcons et celles des filles est due a une plus grande possi- 
bilité dans la naissance des garcons. [pp. 431-432] 


This is followed by some comments on births in London, it being noted 
that the ratio of births of boys to those of girls here is greater than the 
ratio in Paris*®. 

In Article XX Laplace proposes, using the data of the preceding article, 
to determine the probability that, in any given year, the number of births 
of boys does not surpass that of girls. Supposing that of 2a births (the 
mean number in a given year) m are male, Laplace obtains from formula 


(17) above 


_ (2a)! (p+qt+1)! (¢q+2a—m)! (p+m)! 
= (p+q+2a+1)! pq! (2a —m)! m! 


) 
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the “sum” of which, taken over all values of m, yields the desired result. 
Denoting by ym the expression (q + 2a — m)!(p+m)!/(2a—m)!m!, 
Laplace shows that 


_ (m+1)(q + 2a— Mm) 
Ym a= m)(p-+ m+ 1) tt 


More generally, he is led to consideration of the finite difference equation 


Um = 2m Aym ) 


whence he deduces 
iO Sapo ee A es) 
= [2m—2 A (2m-3 Azm-a)| at 33 -} 


analogous to the expression (19) above. As in the discussion of that ex- 
pression, two approximate bounds for the exact solution are obtained, and 
the results are applied to birth data from Paris (it being shown that the 
probability that the number of male births does not exceed that of female 
in one year is less than 1/259) and London, the probability of the event 
concerned being even smaller here. A similar, but more difficult, problem 
is treated in the Théorie analytique des probabilités (see Todhunter [1865, 
art. 897]). 

In the twenty-first article Laplace points out that the preceding theory 
required the knowledge of the number of times each simple event had hap- 
pened. It is thus but a particular case of that part of the analysis of chance 
(“des hasards”) that consists in going back from events to causes; and in 
subsequent articles he proposes to consider 


une méthode générale pour déterminer les possibilités des événe- 
ments simples, quel que soit l’événement composé dont on a 
observé |’existence. [p. 439] 


The present article is really preparatory to the satisfaction of this avowed 
aim, and in it Laplace examines in more detail the problem, already dis- 
cussed in Article III, of the determination of the skills of two gamblers. 

A direct and general method for the determining of the possibilities of 
simple events, irrespective of the event observed, is considered by Laplace 
in the twenty-second article. Denoting by z and 1 — x the desired possibil- 
ities of simple events, he notes that the probability of the compound event 
in question will be a function of x multiplied by some coefficient. Calling 
this function y and denoting by a the value of z, positive and less than 
one, that maximizes it, Laplace notes that not only is this value the most 
probable, but it is also the limiting value of the true possibility x. This 
claim is illustrated by several examples [pp. 441-442]. 
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Laplace next points out that the integral [ y dz, taken over a very small 
interval about the maximum, is then very close to the same integral eval- 
uated between 0 and 1: 


or le rapport de la premiére de ces intégrales a la seconde ex- 
prime la probabilité que la valeur de x est comprise dans cet 
intervalle. [p. 442] 


He concludes by mentioning (and adduces an example in support of this 
assertion) that compound events are not at all suitable for determining the 
possibilities of simple events. 

The need for the evaluation of the ratio of the integrals of this article 
having been considered in Article XVIII (cf. also the start of Article XXV), 
Laplace proposes in Article XXIII to generalize these results and to extend 
them to all values of y. This leads to an equation of the form 


ydzr =azdy, 


z being a function of x that contains no powers of order 1/qa at all. 

While the methods of Article XVIII (aided by the “beau théoréme de M. 
Stirling sur la valeur du produit 1.2.3...u, lorsque u est un tres grand nom- 
bre” [p. 445]) may be used, Laplace’s search for a more direct method leads 
eventually to the evaluation?” of i e-*” dt. This he does by considering 


/ i e844") du ds and | i, e~ 8+") dg dy 
o Jo o Jo 


and equating the results. This leads, for very small a, to 


1 2 2 
d*y 
= 3 a 
(| yz = ony / Ge?’ 


the right-hand side being evaluated at x = a (the value of x corresponding 
to the maximum of y). The problem is repeated in the Théorie analytique 
des probabilités (see Todhunter [1865, art. 899]}). 

This question is further pursued in Article XXIV: certain errors in the 
formulae of this article are exposed in Todhunter [1865, art. 900}. 

In Article XXV the methods of Articles XXIII & XIV are used to derive 
an approximate expression for i z?(1 — x)? dx. Laplace’s method (“si je 
ne me trompe” [p. 456]) is stated to be “more direct” — independently of 
its generality — than those of Stirling and Euler, and this is illustrated by 
consideration of y = z?e~*, whence 


1 1 
pla pthe evita (14 2+). 


12p 


Laplace begins his next article*® by reminding us that we saw, in Article 
XIX, that the ratio of births of boys to girls is sensibly greater in London 
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than in Paris; an observation that seems to indicate a greater facility in 
London for the birth of boys. He asserts further that the preceding method, 
more easily than any other, will permit the ascertaining of how probable 
this difference is. Defining*® 


u — the probability of the birth of a boy in Paris; 

p — the number of births of boys observed in that city; 

q — the number of births of girls observed in that city; 
u-—x — _ the possibility of the birth of a boy in London; 

p' —_ the number of births of boys observed there; 

gq’ — the number of births of girls observed there, 


we find that the probability of the double event is proportional to 
u? (1 — u)f(u — x)? (1 —ut+ ryt 


Thus the probability P that the birth of a boy is less possible in London 
than in Paris is given by 


ee Cee ae 
{fw —u)t(u—2)P'(1—-u+ 2)" dedu’ 


The integration in the denominator is over all values of uw and x, while 
that in the numerator is over x = 0 to u and u = 0 to 1 (Laplace’s limits 
are wrong here). The rest of the article is devoted to the evaluation of 
the double integrals’: the approximate solution for the data gathered is 
P = 1/410, 458 , and the final conclusion®? is 


ainsi l’on peut regarder comme une chose tres probable qu’il 
existe, dans la premiere de ces deux villes, une cause de plus 
que dans la seconde, qui y facilite les naissances des garcons, et 
qui dépend soit du climat, soit de la nourriture et des mceurs. 


[p. 466] 


In the next article Laplace extends the theory of the preceding articles 
to a larger number of simple events, the theory being illustrated by an 
(infinite) urn problem (with balls of three different colours). Once again it is 
stated that the value of z that maximizes a certain integral is consequently 
the most probable value of x. 

Consideration thus far has been limited to the case of a uniform prior: 
as Laplace writes at the beginning of his twenty-eighth article, 


jusqu’ici nous avons supposé la loi de possibilité des événements 
simples constante depuis zéro jusqu’a l’unité, et cette supposi- 
tion est, comme nous l’avons observé dans l’article XVII, la 
seule que l’on doive adopter, lorsqu’on n’a aucune donnée rela- 
tivement & ces possibilités. [p. 469] 
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Here he proposes to consider the case in which the law (i.e. the prior) 
is known exactly. Limiting himself to the case of only two simple events 
of possibilities x and 1 — x, Laplace deduces from Article XV that the 
probability P that the value of x lies between 6, and 9, (say) is 


G2 1 
ol usyde / [ usy dx , 
A, 0 


where u (= u(x)) denotes the facility of the possibility z of the first event, 
s denotes the facility°? of the possibility 1 — x of the second event, and y 
is the probability of the observed event. 

In his twenty-ninth article Laplace turns his attention to the question 
of the determination of a future event as determined by known events. 
Denoting by xz and 1 — x the possibilities of two simple events and by 
s and s’ the facilities of and 1 — x respectively, one can calculate the 
probabilities, both of the observed event and of the future event, proceeding 
from these probabilities, a procedure that yields two functions of x, say y 
and u respectively. By Articles XIV & XV the desired probability P is then 


given by 
1 1 
Pd. ssuyde | | ss'ydz . 
0 0 


If the event is very complicated, the method of Article XXIII may be used 
to evaluate these integrals by a very rapidly convergent approximation. 

Particular attention is paid to the case in which one has no informa- 
tion about the law of possibility of the two simple events, in which case*® 
one must suppose s = s’ = 1, and Laplace also points out that his ap- 
proximation ceases to be exact if the future event concerned is itself very 
complicated. 

The investigations of this article lead to the following “théoréme assez 
remarquable”: 


la probabilité d’un événement futur, pareil a celui que l’on a 
observé, est a cette méme probabilité, déterminée en employant 
pour les possibilités des événements simples celles qui résultent 
de l’événement observé, comme 1 est 4/2. [p. 475] 


As an illustration Laplace refers to the example already considered in Ar- 
ticle XVII — i.e. given p+ q births (p boys), the probability P that p+q 
future births will result in p boys and q girls is? 


P q 
1 
ONG Ge 
Po \pt+a/) \p+a/ V2 
It is worth noting that the expression (20) given above for P is not 


obtainable from that of Article XVII simply by replacing m and n in that 
article by p and q respectively. Laplace’s concern here seems to be with 
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the repetition of the compound event of p+ q births. This observation is 
supported by his comment on the case of n repetitions, viz. 


si Yon cherche la probabilité P que l’événement observé sera 
suiv1 d’un nombre n d’événements pareils, on aura u = y”, 
et Yon trouvera P = v"//n+1, v étant ce que devient y, 
lorsqu’on y substitue pour z la valeur a quirend y un maximum, 
et cette équation a également lieu, n étant fractionnaire. [p. 475] 


But this technique is not to be regarded as universally applicable, and 
Laplace does in fact sound a warning: 


on s’exposerait donc alors a des erreurs considérables, en em- 
ployant, dans le calcul de la probabilité des événements fu- 
turs, les possibilités des événements simples qui résultent de 
l’événement observé: en effet, il est visible que la petite erreur 
que l’on peut commettre, en faisant usage de ces possibilités, 
s’accumule en raison du nombre des événements simples qui 
entrent dans l’événement futur, et doit occasionner une erreur 
sensible lorsqu’ils y sont en trés grand nombre. [p. 475] 


The rest of the memoir is devoted to a discussion of the theory of errors”? , 


most of which discussion is not of direct concern to us. However, since we 
have already looked at something on this topic in our examination of the 
1774 Mémoire sur la probabilité des causes par les événements, and since 
we shall have occasion to consider something similar in the memoir to be 
discussed in §7.11 below, it seems wise to look at the topic now®®. 

As one of the most useful problems in this part of the analysis of chance 
(“hasards”), which consists in going back from events to the causes that 
produce them, Laplace cites, in his Article XXX, the determination of 
the mean of the results of several observations. Having referred to his 1774 
memoir and related work by Lagrange, Daniel Bernoulli and Euler, Laplace 
states that he proposes here to resume this matter and to present his re- 
sults in such a way as to leave no doubt as to their precision. 

In his memoir of 1774 Laplace had assumed that the errors were identi- 
cally distributed: now, although still retaining the even distribution of the 
errors, he supposes that the facilities of the errors for the first, second, ... 
observer are y(z), y’(x),... respectively. Although, as Sheynin [1977, p. 8] 
has pointed out, “the condition of asymptotic decline of the density is now 
omitted”, this condition is reintroduced in Article XXXII. Supposing that 
the errors of the first observation (“celle qui fixe le plus tot le phénoméne” 
[p. 476]), the second, third, ... are z, p— x, p'—2z, ..., Laplace arrives at 
the density 


y= p(x) y(p—ax)y"(p'- 2)... 
By Article XV the probabilities of the different values of x are to each 
other “comme les probabilités que, ces valeurs ayant lieu, les observations 
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s’écarteront entre elles des quantités observées p, p',p”,...” [p. 477]. Thus 
the ordinates of y are proportional to the probabilities of the corresponding 
abscissae x, “et par cette raison nous la nommerons courbe des probabilités” 
[p. 477]. 

Laplace next points out that by “milieu” or “résultat moyen” of any 
number of observations one may intend an infinity of different things, ac- 
cording as one subjects the result to some or other condition. 


Par exemple, on peut exiger que ce milieu soit tel que la somme 
des erreurs a craindre en plus soit égale a la somme des erreurs 
a craindre en moins; on peut exiger que la somme des erreurs 
a craindre en plus, multipliées par leurs probabilités respec- 
tives, soit égale 4 la somme des erreurs a craindre en moins, 
multipliées par leurs probabilités respectives. Ou peut encore 
assujettir ce milieu a étre le point ot il est le plus probable 
que doit tomber le véritable instant du phénomene, comme M. 
Daniel Bernoulli |’a fait. [p. 477] 


Following Sheynin [1977, pp. 8-9] we formulate these conditions as follows: 


of yde= f yde (: yy aN wa) 


{x:c>N} {z:r<N} 


- . 
(ii) cyde = | ry dz (« yee Priel 5 Pe al): 


{a:2>N} — f{aie<N} 


(111) maximum likelihood, 


where N is the maximum possible error. 

In general, while one may impose an infinity of (other) similar condi- 
tions, each of which will give a different mean, they are not all arbitrary. 
There is one that obtains by the nature of the problem and that serves to 
fix the mean that it is necessary to choose between several observations: 


cette condition est que, en fixant ace point l’instant du phénom- 
ene, l’erreur qui en résulte soit un minimum; or comme, dans 
la théorie ordinaire des hasards, on évalue |’avantage en faisant 
une somme des produits de chaque avantage a espérer, multi- 
plié par la probabilité de l’obtenir, de méme ici lerreur doit 
s’estimer par la somme des produits de chaque erreur a crain- — 
dre, multipliée par sa probabilité; le milieu qu’il faut choisir 
doit donc étre tel que la somme de ces produits soit moindre 
que pour tout autre instant. [pp. 477-478] 


This may be represented symbolically as 


(iv) [ev dx = minimum, 
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the integration being taken over all possible values of z. 
Taking, then, the courbe des probabilités to be 


y= (rz) p(p— 2x) p'(p' — 2)... 

where z € [—f,c — f], Laplace first makes the substitution z = z — f, so 
that z € [0,c] — though, as Sheynin [1977, p. 8] points out, he does not 
drop the assumption of evenness, so that in fact | z |< c. It is next noted 
that the probabilities of the different values of z are proportional to y or 
y(z — f)y'(p—z+f)..., the proportionality constant being denoted by 
k. If his “la valeur de z que l’on doit prendre pour le véritable instant du 
phénomeéne” [p. 478], then the last condition mentioned above requires the 
minimization of 


bf o—ayde te [enya 


Differentiation with respect to h yields 


h c 
i yde = | ydz. 
0 h 


The ordinate corresponding to this value of h, which determines the mean 
to be chosen, thus divides the area under the courbe des probabilités, be- 
tween z = 0 and z =, into two equal parts. The result, Laplace notes, is 
the “milieu de probabilité” [p. 479] (see condition (i))*”. 

In Article XXXI Laplace discusses the difference between the cases in 
which the laws of facility of the errors of observation are known and those 
in which they are unknown. In the former case it follows from the preceding 
article that the question of the determination of the mean of several obser- 
vations reduces to the division of a given surface into two equal parts (a 
problem in pure Analysis). However, when the laws of facility are unknown, 
it is the calculus of probabilities that is needed to supply this ignorance. In 
this case we know from Article XIII that if ta,-ta’,+a”,... are the limits 
of the error of the first, second, third, ... observations, one must suppose 

/ 


1 a i a 
p(z) = 5, 08 = y'(z)= 5qf 8 = pote 


(In his thirteenth article Laplace explains this choice in the following way: 


il est naturel de penser que les mémes erreurs, en plus et en 
moins, sont également probables et que leur facilite est d’autant 
moindre qu’elles sont plus grandes; si l’on n’a aucune autre 
donnée, relativement a leur facilité, on retombe évidemment 
dans le cas du probleme précédent: il faut donc supposer alors 
la possibilité, tant de l’erreur positive z, que de l’erreur négative 
—z, égale a (1/2a) log (a/z); et c’est cette loi de possibilité dont 
il faut partir, dans la recherche du milieu que l’on doit choisir 
entre les résultats de plusiers observations. [p. 413]) 
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Once again, only the inevitable difficulties of Analysis remain, though one 
must admit, says Laplace, that they make the preceding method very dif- 
ficult to use. 

His object here, Laplace states, has been rather to make known what 
light the analysis of chance (“hasards”) can shed on this matter, than to 
present to observers a method both practical and easy to use — a method 
that, however, can be used on very delicate occasions. 

Laplace starts his thirty-second article by pointing out “comme il est 
facile de s’en assurer” [p. 480], that the ordinary rule for the arithmetic 
mean arises from this method when a = a’ = a” = --- = oo. He proposes 
here, however, to give a much more general theorem, showing that this rule 
always results under the following assumptions: 


1° que la loi de facilité des erreurs est la méme pour toutes les 
observations; 

2° que les mémes erreurs, soit en plus, soit en moins, sont 
également possibles; 

3° qu’elles peuvent étre infinies et que la fonction qui exprime 
leurs facilités ne décroit d’une quantité finie que lorsque zx est in- 
fini, mais qu’alors elle va toujours en diminuant jusqu’au point 
de devenir nulle. [p. 480] 


Denoting by y(az) the law of facility of the errors of observation, a 
being infinitely small, and by q the value of y(ax) when az = 0 (and as 
a result, whenever z is finite), one sees that the ordinate of the courbe des 
probabtlités from —zr = 0 to —x = oo is 


y = o(ax) p(ap+ az) p(ap'+az)... 


(Note that Laplace is once again assuming that all priors are the same.) If 
we suppose that there are n observations and if we ignore terms of order 
a’, this last expression becomes*® 


y= [olaa))" +0 (Dp) folae)l" eel) (21) 


where 
Dp =ptp +: + ped. 
Laplace’s integration I find rather confusing: his answer, however, is correct, 


as the following argument shows. 
Firstly, from (21), and since y is even, we have 


[uae | {tolazy! +0 (5'P) folaa)"" 7 olaz)| de 


— OO —0o 
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=A+ (d'p) a 7 


=A- (S~'p) q’/n, 
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since y(az) = q when z = 0 and y(az) = 0 for = oo. Consider next 
the interval [0, pr] . Recalling the definition of g, one sees that one may 
suppose here that y(ax) = y(ap— ar) = --- = q. Thus the ordinate y is 


just gq”, and 
p(n) 


/ ydxr = pir-Ygn , 
0 


Finally, for x € eee 00) , one has 


y = (ar) p(ar —ap)y(ar— ap’)... 


= [p(az)]" - « ('p) fe(aa)|"" Zz e(aa) 


Now 
oO CO porn} 
[lands =f folaxyrde— f° [p(aa))* ade 
p(n—1) 0 0 
on A — pi?—1gn 
and 
a _, d [p(axr)]” |°° =) 
m-1 oo = Sos. teed 
fc Wola Gagyelen)de = EE = a 
Thus 


a yt ie nm 1 nr 
/ ydz = A— pg + — (Sp) q”. 


Hence the entire area under the courbe des probabilités is 


1 1 
n (>~'p) q” Agta t) an 4+A agg? a - (S~’p) q” IDA. 
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If we now denote by A the abscissa whose ordinate divides this area into 
two equal parts, the part of the area that is to the left of this ordinate is 
clearly 


1 
A- = (Sip) q" + hq” 


(because h € [0,p'"~))]), and on setting this equal to A, we get 


b= = (Sp), 


which yields the same value for h as “la regle des milieux arithmétiques” 
[p. 481]. 


Les suppositions qui nous ont conduit a ce résultat étant hors 
de toute vraisemblance, on voit combien il est nécessaire, dans 
les occasions délicates, de faire usage de la méthode que nous 
avons proposée. [pp. 481-482] 


In the final article of this memoir, Laplace considers the following prob- 
lem: suppose that in repeated checks of an instrument one has found n 
different errors p, p’,p”,... that are repeated i,7,k,... times respectively 
and that have respective facilities 21, %2,2%3,... . The probability of the 
system of facilities will then be 


n 
ej ahah ..day dea dea... / f eicick...dz,drydz3... , 


the integral being taken over all possible values of 71, 42, 23,... . Repeated 
integration shows that the probability that the facility 7; lies between given 
limits 0, and @ (say) is 


02 . » 1 . . 
i; xi(l—a,t*t~ da, /| ei (1—2,)/ tt da, , 
Ay 0 


an expression we have already seen in Article XVIII. A similar example is 
adduced (see p. 483), from which a simple rule, based on this result, follows 
for the correction of the instrument [p. 484]. 

This concludes the memoir, one that “deserves to be regarded as very 
important in the history of the subject” (Todhunter {1865, art. 905]). While 
the methods of approximation of definite integrals derived here are certainly 
important, the memoir is perhaps more noteworthy from the viewpoint of 
our present work for its applications of Bayes’s and related results. 


7.7 Sur les approximations des formules 
(suite) 


This “Mémoire sur les approximations des formules qui sont fonctions de 
tres grands nombres (suite)”, published in 1786 in the volume for 1783 of 
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the Mémoires de l’Académie royale des Sciences de Paris, pp. 423-467, is 
a continuation (Article IV, Application de l’analyse précedente a la théorte 
des hasards, in fact) of an earlier memoir of the same title published in 
1785 in the volume of the Mémoires for 1782, pp. 1-88: the numbering 
here is a continuation of that of this earlier memoir, a summary of which 
is presented in Appendix 7.3 to this chapter. 

In Number XXXII, the first section of the memoir, Laplace repeats cer- 
tain elementary definitions and probabilistic notions that he had already 
stated in earlier writings. Here he gives a precise definition of the term 
“chance”, viz.°° 


le mot hasard n’exprime donc que notre ignorance sur les causes 
des phénoménes que nous voyons arriver et se succéder sans 
aucun ordre apparent. [p. 296] 


Once again he repeats that “la probabilité est relative en partie a cette 
ignorance, en partie 4 nos connaissances” [p. 296], and iterates the scope 
of the theory of chances, viz. 


la théorie des hasards consiste donc a réduire tous les événements 
qui peuvent avoir lieu relativement a un objet, dans un certain 
nombre de cas également possibles, c’est-a-dire tels que nous 
soyons également indécis sur leur existence, et a déterminer le 
nombre des cas favorables a |’événement dont on cherche la 


probabilité. [p. 296] 


Of the number of favourable cases he says further that “le rapport de ce 
nombre a celui de tous les cas possibles est la mesure de cette probabilité” 
[p. 296], though I doubt whether he in fact finds a distinction between 
probability and the measure of probability. 

Much of this article is concerned with the influence of past events on the 
probability of future events (a topic introduced in this initial number) and 
in this respect the memoir is irrelevant to our study of “inverse inference” 
— apart, of course, from any pertinent detail on the rule of succession. 

In a short second number (XXXIII) Laplace repeats a formula given in 
his Mémoire sur les probabilités, namely Pr [e|E] = Pr[e + E]/Pr[F], a 
formula that, he stresses, is basic to the whole theory of the probability of 
causes and of future events. 

In Number XXXIV, under the assumption that each of n causes e, eA). 

_, e("-1) has (prior) probability 1/n, and denoting by a, a) gin) 
the (posterior) probabilities of an event E given these causes, Laplace de- 
duces from the formula of the preceding number that®° 


p) = Pr je(”? | | = gi") / S al’) (a0 = a) 
20 


a result that we recognize as a discrete Bayes’s formula. 
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This result is then applied to a “sampling with replacement” scheme, an 
application that is worth noting since in it Laplace seems to assign equal 
probabilities @ priori to combinations rather than permutations. An urn 
contains three balls that are white or black: m drawings (with replacement) 
result in m whites (event E, say). Denoting by e, e“), e€), e@) the following 
four hypotheses respectively 


All three balls are white, 

Two balls are white and one is black, 
One ball is white and two are black, 
All three balls are black, 


Laplace says that the probabilities of F conditional on each of these hy- 
potheses are 1, (2/3)™, (1/3)™ and 0 respectively. Thus the posterior prob- 
abilities are 


i 2™ i 


3m 49M 4]? {m4 gms]! 3m pom yO 


respectively. 

It seems that Laplace is here considering the ordered triples (W, W, B), 
(W, B,W), (B, W,W) as indistinguishable, a situation that we may view 
as analogous to one in which we are presented with four indistinguishable 
urns, one of each of the four possible compositions {W, W,W}, {W, W, B}, 
{W, B, B}, {B, B, B}, from which one is chosen (at random) for sampling, 
rather than to one in which three balls are drawn “at random” from a 
very large population of equal numbers of black and white balls, which 
chosen three are then placed in an urn for further sampling (in this latter 
case the probabilities of the four different compositions possible would be 
rt 3 7 5): The attribution of equal probabilities to combinations (and also 
to permutations) was suggested by W.E. Johnson in 1924 and fruitfully 
exploited in his theory of eduction. 

Laplace begins his Number XXXV with the following words: 


la possibilité de la plupart des événements simples est inconnue 
et, considérée a priori, elle nous parait également susceptible 
de toutes les valeurs depuis zéro jusqu’a l’unité; mais, si l’on 
a observé un résultat composé de plusiers de ces événements, 
la maniére dont ils y entrent rend quelques-unes de ces valeurs 
plus probables que les autres. [p. 302] 


Expanding on this latter point, he denotes by x the possibility of a simple 
event, and by y [= y(x)| the probability (obtained from “la théorie connue 
des hasards” [p. 302]) of an observed result. It then follows, he asserts, from 
Number XXXIV that the probability of x will be 


égale & une fraction dont le numérateur est y et dont le dénomi- 
nateur est la somme de toutes les valeurs de y. [p. 302] 
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Multiplication by dz and appropriate integration then show that the prob- 
ability that x lies between @ and 6” is 


[var | [vee (22) 


The final paragraph of this Number is important in that it indicates 
Laplace’s reason for concentrating on equally probable causes. Suppose 
that the different values of z (“considérées indépendamment du résultat 
observé” [p. 303]) are not equally possible, but that their probability can 
be expressed by z = z(x). Laplace suggests that one then replace y by yz 
in the preceding formula, which amounts to supposing all the values of x 
equally possible and to considering the result observed as being formed of 
two independent results of probabilities y and z. 


On peut donc ramener de cette maniere tous les cas a celui ot 
Von suppose une égale possibilité aux différentes valeurs de x 
et, par cette raison, nous adopterons cette hypothese dans les 
recherches suivantes. [p. 303] 


Note that in this case Laplace is still finding Pr[@ < x < 6’) and not 
Pr|@ < z(z) < 6]. Of course, the usual problem that arises with the non- 
uniform prior is that one does not know what it is, and this difficulty is not 
solved by Laplace’s proposal. 

In Number XXXVI Laplace considers the evaluation of (22) — or more 
specifically, Pr [x < 6], where 6 is any number less than a (the most probable 
value of x, or that which maximizes y). This evaluation is accomplished 
by series expansion, two different results being given depending on the 
proximity of @ to a. 

In his next number Laplace continues with the preceding investigation, 
finding®! Pr({a —@ < x <a+6'J. This probability is shown to be 


— e dt 
Vt Jo 


when @ and @ are very small, and is given more generally by 


pe mile 
Va 


when log y is of order 1/a and 0 < \ < 1— in fact, JlogY —logJ = a7*/?, 
where Y = y(a) and J = y(a — 8) = y(a + 6). This leads to the following 


theorem: 


la probabilité que la possibilité des événements simples est com- 
prise entre des limites qui se resserrent de plus en plus approche 
sans cesse de l’unité, de maniere que, dans la supposition d’un 
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nombre infini d’événements simples, ces deux limites venant 
a se réunir, et la probabilité se confondant avec la certitude, 
la véritable possibilité des événements simples est exactement 


Laplace 


égale a celle qui rend le résultat observé le plus probable. 


[pp. 3071-308] 


Laplace stresses the two approximations found here (one relative to the 
limits that contain the value of x and that contract, and the other relative 
to the probability that z is found between these limits, a probability that 
approaches unity or certainty) and points out that these approximations 
differ from the ordinary ones, “dans lesquelles on est toujours assuré que 
le résultat est compris dans les limites qu’on lui assigne” [p. 308]. 
Number XXXVIII is devoted to what is essentially a generalization of 
the problem of Number XXXVI above, leading to a sort of double Bayes’s 
integral. Thus considering the question of two events, each composed of a 
large number of simple events of the same type, occuring (independently) 


in two different places, Laplace denotes by 


x 


Denoting further by P the probability that the possibility of the simple 
event is greater in the first place than in the second, Laplace claims that 


the possibility of the simple event in the first 
place; 
the function of x expressing the probability of 


the observed result in that place; 

the value of x corresponding to the maximum 
of y; 

the possibility of the simple event in the sec- 


ond place; 
the function of x’ expressing the probability 


of the observed result in that place; 
the value of x’ corresponding to the maximum 


of y’. 


analogously to the discussion of Number XXXV, 


1 pe as 
P= fp [uvar'a/ ff vvas'ae. 
o Jo o Jo 


One might well rewrite this as 


P= Prile Sn'| 


ee eee or [Pre <ule=dfr(o)dy. 


Compare also the ratio of the integrals given in Article XX VI of Laplace’s 
Mémoire sur les probabilités. The rest of this Number is devoted to approx- 


imations to these integrals. 
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As an application of the preceding result, Laplace addresses himself, in 
Number XXXIX, once again to the question of births (cf. Article XX VI of 
his Mémoire sur les probabilités). He begins by deriving an expression for 
the probability that the possibility x of the birth of a boy does not exceed 
any given 9 (p+ q births having been observed, with p much greater than 


q), viz. 


Pr [x < 6] 
— V2n [p — (p+ q)0] pPtt/2qati/2 [p — (p + 9) 6]? | 


Putting ? = 5 we get the probability that the possibility of the birth of a 
boy is less than that of a girl, viz. 


Am 22) i | 
(p — q)aptat3/2ppti/2gatl/2./q (p — q)? 


As an example Laplace considers the births in London, Paris and the 
Kingdom of Naples (excluding Sicily), and he determines the respective 
probabilities numerically. Todhunter [1865, art. 909] regards the present 
exposition as a “much better investigation” than that presented in the 
Mémoire sur les probabilités. 

Having observed that the ratios of births of boys to girls in London 
and Paris are 19:18 and 26:25 respectively, Laplace proposes in Number 
XL to determine with what likelihood the observations indicate that the 
possibility of the birth of a boy in the former city is greater than in the 
latter: this is a particular case of the theory of Number XXXVIII, with y 
(in Paris) being given by 


y= ee x?(1—a)f 


and y’ (in London) by 
/ / 
oe P +q ip! i at q’ 
y = ( pI Ee (l-—2')!. 


He finds that there is a more than 400,000 to 1 chance that there is a cause 
in London besides that (“de plus qu’a”) in Paris facilitating the births of 
boys. A similar comparison is effected between Paris and the Kingdom of 
Naples, the probability that the possibility of the birth of a boy in the 
former is greater than in the latter being about 1/100. 

In his Number XLI Laplace turns his attention to the question of the 
probability of future events, estimated (“prise”) from past events, suppos- 
ing that, having observed a result composed of any number whatsoever of 
simple events, one wishes to determine the probability of a future result 
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composed of the same events. Denoting by z the possibility of the simple 
events, by y the corresponding probability of the observed result, and by 
z that of the future result (y and z both being functions of x), Laplace 
deduces from Number XXXIV that the probability P of the future event, 
given (“prise du”) the observed result, is 


1 1 
p= | yede | | yd . 
0 0 


As an illustration Laplace considers the case of an urn containing an 
infinite number of white and black balls, from which one white ball has 
been drawn. What is the probability P that the next ball drawn will also 
be white? If one denotes by x the ratio of white balls in the urn to the total 
number of balls, “il est clair que x sera la probabilité, tant de l’événement 
observé que de l’événement futur” [p. 326], and one has 


1 1 
ey oe] | ae 
0 0 


Next Laplace considers the case of drawing one white ball, followed (in 
the future) by a sequence of n black balls. In this case°? 


p= f a-aytar [fede =2/(n+ vn42) 


If, however, white and black balls are [known to be]®* in equal numbers in 
the urn, P = 1/2”, a value that is less than that just obtained for n > 4. 
From this follows the result that, although the first draw makes it probable 
that there are more white than black balls, the probability of getting four 
black balls in the following four draws is much greater than if one supposes 
equal numbers of white and black balls. The apparent paradox is due, says 


Laplace, to the fact that (in modern notation) 
Pr [B, Bo Bs ve | = (Pe [Bi] Pr [Bs | By] Pr [Bs | By Bo] pas 4 


the “probabilités partielles” [p. 327] being always increasing and tending 
to 1 as n —> oo. 

This discussion is continued in Number XLII, but now it is supposed 
that the observed result (as well as the future one) is composed of a very 
large number of simple events: various approximations are derived. 

In Number XLIII Laplace returns to the problem of births, and defines 


p — _ the number of births of boys (in Paris); 
gq — the number of births of girls; 
2n  — _ the annual number of births; 


z -— _ the possibility of the birth of a boy. 
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Denoting further by z the sum of the first n terms of the expansion 


2n(2n — 1) 
LQ 


which sum represents the probability that the number of boys will, in each 
year, prevail over that of the girls, and z' being the probability that this 
superiority will be maintained during 2 consecutive years, one finds that 
the true probability P that this will happen is, by Number XLI, 


1 1 
oy re'(1—aytde / f z’(l—ax)'dz. 
0 0 


The rest of this number is taken up with approximations for these integrals, 
and a numerical example is adduced. 

At the start of the final number of this article Laplace relates the present 
memoir to its predecessor in the following words: 


pen ie Qn2?"—*(1 _ x) Ae 2?2n— 24 = x)? Brine: 


les recherches précédentes suffisent pour faire voir les avantages 
de l’analyse exposée au commencement de ce Mémoire, dans 
la partie de la théorie des hasards, ot 11 s’agit de remonter 
des événements observés a leurs possibilités respectives et de 
déterminer la probabilité des événements futurs. Cette analyse 
n’est pas moins utile dans la solution des problemes ou |’on 
cherche la probabilité d’un résultat formé d’un grand nombre 
d’événements simples, dont les possibilités sont connues. 

[pp. 3384-335] 


7.8 Sur les naissances 


Laplace’s memoir®®, “Sur les naissances, les mariages et les morts a Paris, 
depuis 1771 jusqu’en 1784, et dans toute l’étendue de la France, pendant 
les années 1781 et 1782”, published on pages 693-702 of the same volume 
as that in which the memoir discussed in the previous section appears®® , 
is devoted to an examination of the subjects of its title from the point of 


view of a 


théorie nouvelle et encore peu connue, celle de la probabilité 
des événements futurs prise des événements observés. [p. 37] 


By this means Laplace proposes to consider the following problem: suppose 
that, on the basis of past censuses, the ratio of births to population size 
is known for a given period in a large number of parishes in all provinces 
of France, these parishes being chosen in such a way that the birth-death 
ratios there found are the same as that in the whole kingdom. If, in addition, 
one knows the number of births in a given period in the whole of France, 
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how should one estimate the total population size, and what can one say 
about the error incurred in such estimation®”? 

To solve this problem, Laplace considers the case of an urn containing an 
infinite number of white and black balls in unknown ratio. A preliminary 
drawing from this urn results in p white and q black balls, while a second 
drawing yields q’ black and an unknown number of white balls, a number 
that is most naturally estimated by pq'/g. Denoting the true unknown 
number by P’, our aim is to find 


Pr [|P’ — pq’ /q| < pq'w/q |p, 9, 9] 


for given w, p,q and q’. 

Laplace’s reasoning seems somewhat confused: the following is an at- 
tempt at understanding it. Let X be the unknown ratio of white to total 
number of balls originally obtaining, with P’ and Q’ being appropriate 
random variables. Then 


/ / ; F 
Pr[P! =p, Q!=q |X =a] = (77 ara a)" 
Since 


shod hy ; ; 
Pr[Q’=q' |X =2]= D0 (Pa) a" - 2) =(l-2)?, 
p'=0 
it follows that 
PoP So | Sq 7x =o) 


= Pr[P=p',Ql=q'|X =a) /Pr(Q=q' |X =2] 


/ / 
= (Aetna 


and based on the first drawing we have 
i 
Prile< X <a+dz [psa] = "(1 —2)'de / | z’(l1—2)'dz. 
0 


Now by the definition of conditional probability (and using a less cum- 
bersome, though I trust sufficiently precise, notation) 


Pr [p’,z | p, 9g, q'] = Pr[z | p,q] Pr[p’ | a’, e] Pr(a’ | 2] /Pr[q' | p, a] 


provided one assumes that (P,Q) and (P’, Q') are conditionally independent 
given x. Laplace in fact assumes that 


Pr[p’,z | p,q,q'] = Pr[z| p,q] Pr[p’ | ¢’, 2] , 
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and here again we take note of something we have already noticed before 
(see §7.3), viz. his conception of conditional distributions as being defined 
only up to proportionality (it is unfortunate that the proportionality “con- 


stant” in fact depends upon z)°®. 
Using this last “equation” one obtains, finally, 


fob / 1 j j 1 
Prip' Ipagl=(P 5") f arta atlas | for —2) de, 
0 0 
from which®? it follows that 


Pr(0< P’<s|p,q,q] 


1 j 5 ee / , 1 
= i 2P(1 — x)ita +t y (? a a? dx rp) 2’(l—2x)'dz. (23) 
0 p'=0 0 


If q/ and s are both very large numbers (a condition that seems sufficient 
but at this stage unnecessary), one has’° 


2 / / 
SOP er aaa = hoo +hs+1) 


p'=0 
1 f 1 / 
. y(1—y)? iy / | y(l—y)! dy. 
xL 0 


Substitution in (23) yields 


2P(1— a)! y3(1—y)% dydzx 
Prl(Q< P’<s|p,q¢,q'] = (24) 


P(1—2)tys(1—y) dydz 


OS OS Su 
St Lees 


On applying the results of the memoir we have considered in §7.7 Laplace 
concludes that if s is less than and very little different to pq’/q then (24) 


becomes approximately 
1 i: Oy ci 
— e" dt, 
Vu Sr 


po (P/(p +9) — 8/(8 +9) (pt aX(s+a)> 
2sq'(p + 4)? + 2pq(s +4')” 


A similar result follows if s is greater than, but not very different to, pq'/q, 
and it follows that, approximately, 


where 


Prles Pi <oJai-e fe? de- ” ent? at 
— Vt Sp Jt Se 
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where T' is as defined above and 7” is defined similarly with s replaced by 
s’. If one in fact sets 


s=(l—w)pq'/q , 8s’ =(1+w)pq'/¢q 


then, on our neglecting terms of order w?, T? and T’” take on the common 
value 


2 pqq'w? 
2(p+aq)(q+aq’) ’ 
and hence 
Pr ((1 — w)pq’/q < P’ < (1+w)pq'/q] =1- = fe 


Having noted the ease with which this result can be applied to the ques- 
tion of population size (an application to which we shall shortly return), 
Laplace now considers how p (the number of people in the original census) 
should be determined so as to obtain a large probability that the error in 
p’ (the predicted population size) is small. To this end, he supposes that 
p = ig and wpq'/q = a, so that the expression for V? given previously yields 


_ 2(i+ 1g? V? 
PS a — (i+ Ng V? | 


Thus p will be determined provided that i,a,q’ and V are known. A nu- 
merical example is supplied. 

Whether Laplace’s urn model is assimilable to his initial population 
problem is doubtful. Pearson [1928, app. IJ], in addition to finding the 
“treatment obscure” [p. 168], finds the interpretation of the population pa- 
rameters in terms of those of the urn to be questionable: he in fact writes 


I can see no justification for Laplace’s method of reducing the 
problem to an urn problem. I see no reason why an additional 
birth in the sample means one fewer member of the popula- 
tion. I see further no ground whatever for considering the first 
sample and France as a whole as independent samples from an 
indefinitely large population. [p. 172] 


Pearson (loc. cit.) presents an analysis of Laplace’s problem from the 
point of view of marked members of a population, the problem being re- 
stated as follows: 


A population of unknown size N is known to contain q’ affected 
or marked members. It is desired to ascertain — on the hypoth- 
esis of inverse probabilities — a measure of the error introduced 
by estimating N to be nxq'/q, where q is the number of marked 
individuals in a sample of size n. [p. 172] 
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His final summing-up of the problem is as follows: 


I venture to think, therefore, that while Laplace’s Problem is 
most important, it does not cover the case to which he applies it, 
and that his solution of the problem itself is not really correct. 
[p. 174] 


7.9 Sur les probabilités 


In the Journal de l’Ecole Polytechnique, VII® et VIII® Cahiers, juin 1812, 
was published’! Lecons de Mathématiques donnees a V’Ecole Normale en 
1795, the diriéme séance being of the above title. This popular statement 
of Laplace’s views was later expanded into an introduction to his Théorie 
analytique des probabilités, and we shall postpone its consideration until we 
discuss this latter work. 


7.10 Sur les approximations des formules 


The “Mémoire sur les approximations des formules qui sont fonctions de 
trés grands nombres et sur leur application aux probabilités” , published in 
1810 in volume X (1809) of the Mémoires de l’Académie des Sciences, I"° 
Série, pp. 353-415, is notable chiefly for its contribution to the theory of 
errors’*. However a supplement to this memoir is pertinent to our present 
purpose, and it therefore seems not inadvisable to say something about the 
memoir itself at this stage’?. 

After an introductory section, Laplace turns his attention in the first 
article to the following problem: 


on suppose toutes les inclinaisons a |’écliptique également possi- 
bles depuis zéro jusqu’a l’angle droit, et l’on demande la proba- 
bilité que l’inclinaison moyenne de n orbites sera comprise dans 
des limites données. [p. 305] 


The formulae obtained in the solution of this problem are applied in the 
second article to the inclinations of planetary orbits, the result obtained 
indicating 


avec une tres grande probabilité l’existence d’une cause primi- 
tive qui a déterminé les orbites des planetes a se rapprocher du 
plan de l’écliptique ou, plus naturellement, du plan de l’equateur 
solaire ... Ainsi l’existence d’une cause commune qui a dirigé 
ces mouvements dans le sens de la rotation du Soleil est indiquée 
par les observations avec une probabilité extréme. [p. 308] 
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Laplace next discusses whether or no this cause has influence on the move- 
ment of comets, the intractability of the expressions derived leading him to 
another resolution of the problem in Article III (it was for this alternative 
method that he developed characteristic functions). These new formulae 
are then applied, in Article IV, to observed cometary data, further approx- 
imations being considered in Article V. 

In the sixth article Laplace returns to the problem considered in the first 
article. He points out that this problem, 


relativement aux inclinaisons, est la méme que celui dans lequel 
on se propose de déterminer la probabilité que |’erreur moyenne 
d’un nombre n d’observations sera comprise dans des limites 
données, en supposant que les erreurs de chaque observation 
puissent également s’étendre dans |’intervalle h. Nous allons 
maintenant considérer le cas général dans lequel les facilités des 
erreurs suivent une loi quelconque. [p. 322] 


The rest of the memoir is taken up with various ramifications of earlier 
results. 


7.11 Supplément: sur les approximations des 
formules 


The “Supplément au mémoire sur les approximations des formules qui sont 
fonctions de trés grands nombres” was published in 1810 in the same vol- 
ume, pp. 599-565, as the preceding memoir. Here Laplace returns to a 
discussion of the theory of errors, undertaking yet a further generalization 
of the work of his earlier memoirs of 1774 and 1778. He proposes to con- 
sider the problem of the combination of several means, each of which is 
formed from a large number of identically distributed and independent ob- 
servations, and although he assumes, as in the Memoir, that the individual 
means are normally distributed, some general discussion is also given. 
Suppose that n,n’,n”,... observations yield means A,A+q,A+q’,... 
respectively, the laws of facility of the different errors being distinct. If 
A+ <2 is the true value, the error of the mean result of the first set of n 
observations is —2x, the probability of this error, by the Memoir, being 


] k -. ee 
Ja V 2k! 


where k = woe y(z/h)dz (y(a«/h) being the true probability of the error 


ses ee MAG 2/h?) p(x/h) dz, and the limits’? between which the 


probability of the error is required to lie are trh/./n. With « = rh/./n 


(25) 
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and a = \/k/2k'/h, (25) becomes 


aie Od 
a,/ne ™ * 


Wa 
If one designates by (—2), ¥’(q —2z), o’(q' — z),... these diverse proba- 
bilities, the probability that the error of the first result will be —z and that 
the others will differ from the first by g,q’,... respectively, will be equal 
to the product 
y= v(—-2) vq —2) ¥"(q'— 2)... 

Once again Laplace suggests that if one constructs a curve whose ordinate 
y is equal to this product, the ordinates of this curve will be proportional to 
the probabilities of the abscissae, and for this reason “nous la nommerons 
courbe des probabilités” [p. 351]. 

Proceeding’® as in his memoirs of 1774 and 1778, Laplace takes as his 
estimate of the mean that value of / such that 


i fore) 
; ydz= | ydz, 
0 i 


this result being obtained by minimizing [|l — z|dz with mespect to l. 
Commenting on earlier work Laplace writes ’® 


Daniel Bernoulli, ensuite Euler et M. Gauss ont prise pour cette 
ordonnée la plus grande de toutes. Leur résultat coincide avec 
le précédent lorsque cette plus grande ordonnée divise |’aire de 
la courbe en deux parties égales, ce qui, comme on va le voir, a 
lieu dans la question présente; mais, dans le cas général, il me 
parait que la maniere dont je viens d’envisager la chose résulte 
de la théorie méme des probabilités. [p. 352] 


Both Stigler [1975, p. 506] and Sheynin [1977, p. 16] suggest that Laplace 
came to know of Gauss’s treatise Theoria Motus Corporum Coelestium of 
1809 after he had written his memoir and that this treatise might well have 
provided the impetus for the supplement. 

Laplace now returns to the case in which the means are normally dis- 
tributed. Writing A+ # =A+ X +z he considers the likelihood 

y=pop' p!...e? 2a (X +z)? —p!?a(q—X —z)? —p!?a(q! —X —z)?... 

where p = a\/n/./a “et par conséquent exprimant la plus grande proba- 
bilité du résultat donné par les observations n” [p. 352] (p’,p”,... being 
similarly defined). X is now chosen in such a way that the term in z in 
the above exponential vanishes, which has the effect that the ordinate y 
corresponding to z = 0 divides the area under the curve into two equal 
parts, and at the same time is the greatest ordinate. One has, in this case, 


X = (p@g+ pq +---)/(? +p? 4+---), (26) 
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and thus y has the form 


_M. 2 
y=pp'p'...e M-Nz 


) 


from which the effect mentioned above is immediate. Thus A+ X is the 
desired mean between the quantities A, A+q,A+q’,... 

Laplace also notes that the value of X given in (26) is that which mini- 
mizes 


[PX]? + bP'(g— XY + "a! — XY + 
(or [p|X|]? + [p’ lq — X|]’ + ---), a function that is described as 


la somme des carrés des erreurs de chaque résultat, multipliées 
respectivement par la plus grande ordonnée de la courbe de 
facilité de ses erreurs. [p. 353] 


It is this remark that I think Stigler [1975, p. 506] considers “a Bayesian 
justification for least squares”: not only, says Stigler, do “the least squares 


estimates ... maximize the likelihood function, considered as a posterior 
distribution, but [they] also minimize the expected posterior error” [1975, 
p. 506]. 


This property is characterized by Laplace as follows: 


ainsi cette propriété, qui n’est qu’hypothétique lorsqu’on ne 
considére que des résultats donnés par une seule observation ou 
par un petit nombre d’observations, devient nécessaire lorsque 
les résultats entre lesquels on doit prendre un milieu sont donnés 
chacun par un tres grand nombre d’observations, quelles que 
soient d’ailleurs les lois de facilité des erreurs de ces observa- 
tions. C’est une raison pour l’employer dans tous les cas. 


[p. 353] 


He concludes by showing that 
2 f? _2 
pr [-T/VN < A+X <T/Vi] = = | en? db. 
Vm Jo 


the value of N, by what precedes, being 1(p? + p’? +p’? +---). 


7.12 Sur les intégrales définies 


Published in the Mémozres de l’Académie des Sciences, I"® Série, Tome 
XI (© Partie) for 1810 (published 1811), pp. 279-347, the “Mémoire sur 
les intégrales définies et leur application aux probabilités, et spécialement 
a la recherche du milieu qu’il faut choisir entre les résultats des observa- 
tions” has a touch of both retrospection (Laplace recalls his earlier work on 
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generating functions) and prospection (two references are made to an im- 
pending work, viz. “... une théorie que je me propose de publier bientot sur 
les probabilités” [p. 360], and “un Ouvrage que je vais bientot publier sur 
les probabilités” [p. 411]. Most of the memoir is devoted to the evaluation 
of certain definite integrals, but there is some discussion of three proba- 
bility problems that receive scant attention from Todhunter (he discusses 
this memoir in his Articles 919-922); Laplace regards the investigations of 
this memoir to be “d’une grande utilité dans la théorie des probabilités” 
[p. 361]. 
Speaking of his calculus of generating functions’’ Laplace says 


par ce moyen, on peut déterminer avec facilité les limites de la 
probabilité des résultats et des causes, indiqués par les événe- 
ments considérés en grand nombre, et les lois suivant lesquelles 
cette probabilité approche de ses limites, 4a mesure que les événe- 
ments se multiplient. [pp. 360-361] 


This research, “la plus délicate de la théorie des hasards” [p. 361], deserves, 
he says, the attention of both mathematicians and philosophers. 

We note also the following definition given in the introductory section of 
the memoir: 


jentends par erreur moyenne la somme des produits de chaque 
erreur par sa probabilité. [p. 362] 


The first three articles are impertinent: we shall consider the others ser- 
atam. 

In his fourth article, headed “Application de l’analyse précédente aux 
probabilités” , Laplace presents the first of his probability problems — ’tis 
what Todhunter [1865, art. 921] refers to as “the problem of the Duration 
of Play”. We shall not discuss this article here: the main tools in the so- 
lution of the problem posed are generating functions and techniques for 
evaluating certain definite integrals. 

In Article V Laplace considers the following urn problem: suppose that 
two urns A and B each contain the same number n of balls, and that of the 
2n balls there are as many white as black. A ball is drawn from each urn 
simultaneously and replaced in the other, the contents of the urns being 
shuffled before the trial is repeated. What is the probability that, after r 
repetitions, there are z white balls in urn A? Laplace develops certain for- 
mulae to solve this problem, as an application of which he considers what is 
essentially an urn problem involving the hypergeometric distribution. More 
precisely, he supposes that an urn C contains a vast number m of white 
balls and the same number of black balls. The contents of C’ having been 
shuffled, n balls are drawn and placed in urn A. One then places in urn B 
as many white (black) balls as there are black (white) balls in A. Under the 
assumptions usual for the appropriateness of this distribution, the desired 
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probability (i.e. that A contains x white balls) is found to be 


(7) / C2): 


and an approximation to this probability (for large values of m,n and 2) 
is derived from s! = s*t2e~*V/2r. 

In Article VI, headed “Du milieu qu’il faut choisir entre les résultats 
des observations”, Laplace returns to a problem already considered in his 
earlier memoirs, viz. the finding of a mean between the results of several 
observations. He suggests firstly that one should write the observation C' 
in the form C' = m+ pz, where z is the correction to the element already 
approximately known. If, however, C' is susceptible of an error €, let C-+e = 
m+pz,oré = pz—-y (where (Res om m). Denoting the error in the (2+1)th 
observation by «4 = pz — pl?) ‘) Laplace considers 


s—l s—1 s—1 
SM =z Tp - Se, 
220 1=0 7=0 


where s denotes the total number of observations; and he deduces that, if 
the sum of the errors is to be zero, one must have 


s—l s—1 
c= Sig [Sp 
a=0 i=0 


(the “résultat moyen des observations” [p. 388)]). 


He goes on next to suppose that, rather than requiring > e) to be zero, 
1=0 
one may well look at a linear combination of the errors 


where q,q),... € Z. Substituting pz — ~ for e, and equating the 
result to zero, we get 


yes s- gop [Sr 
He then shows that the probability that (27) lies between the limits tar is 


SST [on e/a ya 


where k = 2f° y(x)dz, k' = f>° x(x) dx, and (2/a) is the (prior) 


probability of an error x in each observation. 
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Laplace next considers the “valeur moyenne de [erreur a craindre” 
[p. 392], a mean value that, he says, “est donc la somme des produits de 
chaque erreur, abstraction faite du signe, par sa probabilité” [p. 393]. The 


value concerned is_ = 
k! i)? 
ey eae (28) 
kr > pl?) gl) 
While the values of p, p“),... are given, those of q, g°),... are arbitrary 
and must be determined by minimizing (28), whence 


GQ pf 
3g? — pO _® 


He then argues that g = up, gq) = pp... ,g@7) = pp©-)), from which 
it follows that must be chosen so that all of the q,q“),... are integers. 


Then (28) becomes 
kn ; 
rp Se (29) 


z= Dopp /S> pO, 


=A 
(Expression (29) does not seem correct: it should be 2a,/~- (> pi) ye .) 


This result, he next points out, “est celui que donne la Peiiede des moin- 
dres carrés des erreurs” [p. 395], and is exactly what arises on minimizing, 
with respect to z, | 


(pz — y)? + (pz — pO) $+ (pODe— ped)” 


Cette méthode doit donc étre employée de préférence, quelle 
que soit la loi de facilité des erreurs, loi dont dépend le rapport 
k/k'. [p. 395] 


He also states (and demonstrates) that, although this law is almost always 
unknown, one may suppose k/k’ > 6. He once again stresses, as in his 
memoir of 1778, that, under the hypotheses presented there (and repeated 
on p. 396 of the present memoir), the (prior) probability of the error +z 
should be taken to be (1/2a) log(a/z). 

In the seventh article Laplace discusses, perhaps more clearly than in 
the memoir of §7.11, the question of the mean of a number of sample 
means, each based on a large number of observations. He supposes that an 
element is given successively by the mean result of s,s’,... observations, 
these means being A,A+q,... respectively. If A+ 2 1s “l’élément vrai” 
[p. 398] the error from the first s observations will be —z. If C is now equal 


to 
[toe 
k! 


and 
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if the method of least squares is used to determine the mean, or to 
k S> pp 
k! 2a,/s 


if the ordinary method is used, it follows from the preceding article that, 
for large s, the probability of that error is 


Ja 


Repeating this for the other sets of data, one finds that the probability that 
the errors are —z,q—2,q'—2,... will be 


C 2,.2 
poe a 


/ Nf 
C' C C —C?2*-C"(x£—q)?—C'? (2-q')?... . 


(30) 


whence it follows that 


en la multipliant par dz et prenant |’intégrale depuis x = —oo 
jusqu’a x = oo, on aura la probabilité que les résultats moyens 
des observations s,s’, s”,... surpasseront respectivement de gq, 
q’, ... le résultat moyen des observations s. [p. 399] 


If one integrates between the given limits one obtains the probability 
that, the preceding condition (that of (30) I suspect) being satisfied, the 
error in the first result will lie within those limits. On dividing this prob- 
ability by that of the condition itself one obtains the probability that the 
error of the first result will le within the given limits, whenever it is certain 
that the given condition has actually arisen. (It is a little hard to see what 
Laplace is driving at here: I suspect he is considering something like 


Pr[l, < error, < ly & condition] 
Pr[condition] 


) 


Pr [l, < error; < lg | condition] = 


This probability is given as 


feo laf Lee 7=C™(em 4g)... dp 


“Vintégrale du numérateur étant prise dans les limites données et celle du 
dénominateur étant prise depuis z = —oo jusqu’a x = oo” [p. 399]. He also 
shows that this probability can be written as 


1 re 
pain ees ae 
ak 


Finally he concludes that 
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la loi du minimum des carrés des erreurs devient nécessaire 
lorsque l’on doit prendre un milieu entre des résultats donnés 
chacun par un grand nombre d’observations. [p. 401] 


Continuing in the style of this article Laplace writes in Article VIII 


la méthode des moindres carrés des erreurs des observations 
est celle qui donne sur la correction des éléments la plus petite 
erreur moyenne a craindre. [p. 401] 


The method of Article VI is extended here to the case in which there are 
two elements, with z being the correction of the first and z’ that of the 
second: here the observation C' is supposed given by 


C=A+pz+qz'. 
As in Article VI we are led to 


where a = C' — A. Proceeding as in that article Laplace shows that the 
corrections to be applied to the two elements are 


Sept) > g(#)? _ (> pg)? 
» = VP Mal) — yp Prat 
S > ptt)? yi g(t)? a (pogo) 


and he goes on to say that these are the corrections given by the method 
of least squares for errors of observations, on our minimizing 


> (pz Bagltlg! at)" | 


He points out that this method may be extended to any number of elements 
whatsoever. Various formulae and comments, similar to those found in 
Article VI, follow. 


) 


7.13 Sur les cométes 


Published in Connaissance des Temps for 1816, dated 1813, pp. 213-220, 
this memoir is notable for the use it makes of posterior probability in the 
solution of a problem in celestial mechanics’®. The problem”? (that of the 
probability of a comet’s having a particular orbit — elliptic, parabolic or 
hyperbolic — on the basis of certain data) considered has been the subject 
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of several papers®® that have shown up certain inadequacies in Laplace’s 
discussion®!, though these seem not to have been noticed by Todhunter 
[1865, art. 925]. Following Fabry [1893-1895], who has in fact considered 
a more general problem than did Laplace, we choose firstly to present 
Laplace’s solution (albeit in a slightly modified form) before undertaking 
any criticism. 

After a discussion of Herschel’s views on the origin of the comets, 


qui consiste a les regarder comme de petites nébuleuses formées 
par la condensation de la matiere nébuleuse répandue avec tant 
de profusion dans |’univers. Les cométes seraient ainsi, relative- 
ment au systéme solaire, ce que les aérolithes sont par rapport 
a la Terre, & laquelle ils paraissent étrangers [p. 88], 


views with which he professed himself to be in agreement®*, Laplace com- 
ments in fairly broad terms on the nature of the orbits, something to which 
he had applied the probability calculus. As a result of these investigations 


J’ai trouvé qu’en effet il y a un grand nombre a parier contre 
Vunité qu’une nébuleuse qui pénétre dans la sphere d’activité 
solaire, de maniére & pouvoir étre observée, décrira ou une el- 
lipse trés allongée ou une hyperbole qui, par la grandeur de 
son axe, se confondra sensiblement avec une parabole dans la 
partie que l’on observe. Cette application de l’analyse des prob- 
abilités pouvant intéresser les géométres et les astronomes, je 
vais l’exposer ici. [p. 89] 


Soon after this we find the statement of the problem that I believe 
Laplace is trying to solve: 


Il faut donc déterminer quel est, dans ces limites, le rapport 
des chances qui donnent une hyperbole sensible aux chances qui 
donnent un orbe que l’on puisse confondre avec une parabole. 


[pp. 89-90] 


Here the word “limites”, as Laplace uses it, refers to the velocity at the 
moment of entry of the comet into the sphere of the sun’s activity®’, the 
magnitude and direction of which velocity lie within narrow limits. 

It seems from this passage that what we want to find is (in a modern and 
— I hope — sufficiently self-explanatory notation) Pr[H]/ Pr[H’], where H 
and H’ are two hypotheses®*. This is indeed what is found in the last part 
of the paper: in the first part what is found is Pr [H | data]/ Pr[H’ | datal, 
a typical Bayesian quaesitum. Laplace follows up the preceding quotation 
with a sentence in which he points out the importance of the prior distri- 
bution, writing 


Il est clair que ce rapport dépend de la loi de possibilité des 
distances périhélies des cométes observables ... [p. 90] 
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Examination of extant data shows that, beyond a certain distance equal to 
the radius of the earth, the possibilities of the perihelion distances decrease 
very rapidly as these distances increase. This should be reflected in the law 
of these possibilities; but this being generally unknown, one is only able to 
determine the limit of the ratio concerned, or its value in the case most 
favourable to visible hyperbolas. 

There is then some summary discussion, with which we need not concern 
ourselves at the moment, of the results obtained in the paper, this discus- 
sion being followed by some observations on the comets of 1682 and 1770. 

Laplace now introduces some notation: 


V — the velocity of a comet at the instant at which it 
penetrates into the sphere of the sun’s activity (i.e. 
“cette partie de l’espace ot |’attraction du Soleil 
est prédominante” [p. 88]); 
r — the radius vector of the comet at the same instant; 
— the semi-major axis of the orbit that it proceeds 
to describe about the sun; 
e — the eccentricity of the orbit; 
D — the perihelion distance (of the orbit that it is going 
to describe about the sun). 


Taking as the unit of mass the mass of the sun, and, as the unit of dis- 
tance, its mean distance to the earth (and ignoring the masses of the comets 
and planets relative to that star) one obtains the well-known formulae 


12 yp 

res aa 

rV sinw = ,/a(1 —e?) (31) 
D=a(l—e) 


where w denotes the angle that the direction of the velocity V makes with 
the radius vector r. Fabry [1893-1895, p. 35] gives these formulae (notation 
slightly altered) as®° 


k=rV sinw 
D=a(1-e) 


a(i—e?) == 


We shall deal most often with Laplace’s formulae, though reference to 
Fabry’s will be made from time to time. 
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Elimination of a and e yields 
sin? w = (2D — 2D? /r + D? Ys 


[or f(2D — 2D? /r + DV 7) fey?) , whence 


VY1l-—D 
1—cosw=1- VIO VA + D/r)—2D. 


Fabry [1893-1895, p. 6] points out that this latter expression holds for 
values of w between 0 and 7/2: for values in (7/2,7) it is necessary to 
replace the — sign in the second term by +. 

Laplace now introduces an equiprobability assumption in the following 
words: 


Maintenant, si l’on imagine une sphére dont le centre soit celui 
de la cométe et dont le rayon soit égal a la vitesse V, cette 
vitesse pourra etre également dirigée vers tous les points de la 
moitié de cette sphere comprise dans la sphere d’activité du 


Soleil. [p. 92] 


A simple argument (using what would today be regarded as fairly elemen- 
tary calculus) then shows that 


Pr[0 < direction of V <w]=1— cosw. 


(Laplace of course does not use this notation, and choice of “<” or “<” 
seems a matter of personal preference in interpreting his results.) 
Laplace obtains this result by considering the ratio 


/ 2m sinw dw/(27) , 
0 


while Fabry’s argument is somewhat different. He supposes that the sphere 
at whose centre the comet is, has radius 1, and that the velocities of comets 
that are almost at the limit of the sphere of the sun’s activity are uni- 
formly directed over that sphere (rather than the hemisphere considered 
by Laplace). This leads Fabry to®® 


Pr|@ < direction V makes with radius vector < 6+ df] 
= 27 sinB dG/Ar , 


whence 


| 


B 
Pr[0 < direction of V < f] | sin G d@ 
0 
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Thus far the paper is unexceptionable. At this point, however, things 
become a little more complicated, and I believe that certain obscurities 
intrude. We shall consider the remainder of Laplace’s discussion firstly in 
the case in which the prior is uniform, and secondly in the case in which it 
is non-uniform. 

Suppose firstly, then, that the prior on D is uniform. The limits of the 
perihelion distance corresponding to the limits 0 and w of [the direction 
of] V (the phrase in brackets is missing from the original) being 0 and D 
respectively, we find on p. 92 the following: 


en supposant donc toutes les valeurs de D également possibles, 
on a pour la probabilité que la distance périhélie sera comprise 
entre zéro et D 


i 
_- MSDE rea 


T 


The reasoning here is perhaps clearer in Fabry’s paper®’: for a fixed value 
of V, w varies between 0 and 7/2, and D increases constantly with w. Thus, 
for fixed V, the probability that the comet has a perihelion distance less 
than some given value is the same as the probability that the direction of 
the velocity is between 0 and w. 

Changing notation slightly, and denoting (henceforth) random variables 
by majuscules and reserving minuscules for values, we can write this latter 
probability as 


Pr[0<D<d|V =o) =1- C= nti d]r) 2 (32) 
(or (1/2) 1 —(/(1 — d/r)/rv)./r?2u?2(1 4+ d/r) — 2d | in Fabry), where D 


and V denote perihelion distance and velocity respectively®®. The suppo- 
sition that all values of D are equally possible seems to be irrelevant. 
Laplace now suggests that 


Il faut multipler cette valeur par dV; en l’intégrant ensuite 
dans des limites déterminées et divisant |’intégrale par la plus 
grande valeur de V, valeur que nous désignerons par U, on aura 
la probabilité que la valeur de V sera comprise dans ces limites. 


[p. 92] 


It is at this stage, I believe, that things start becoming a little awkward 
(though Fabry sees problems arising only later in the Memoir®’). Fabry’s 
discussion sheds some light on the matter, and we shall therefore pursue it 
here. Before doing so, however, it might be wise to see exactly what it is 
we are trying to find. 

From the last quotation it appears that what is wanted is the probability 
Pr [vy < V < ve], say. Yet further on Laplace gives 
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la probabilité que la distance périhélie d’un astre qui entre dans 
la sphére d’activité du Soleil sera comprise dans les limites zéro 
et D, la valeur de V? n’excédant pas i?/r [p. 94], 


which one might well write as Pr|O< D<d|V< i?/r]. Again, on page 
94 we find the words 


la probabilité que la distance périhélie étant comprise entre zéro 
et D, Vorbite sera ou elliptique, ou parabolique, ou une hyper- 
bole dont le demi-grand axe sera au moins égal a 100, 


which seems to refer to Pr[conic of the described form |0< D < dj]. 
Todhunter [1865, p. 493], on the other hand, argues that the problem is 
really one in inverse probability, and that what we need to find is 


Prly<V<wl|0<D<d. 


Fabry’s approach is slightly different: he considers a ratio of two numbers, 
which then, by his definition of probability’°, gives the desired result. The 
final result, obtained by simplification of the integrands and subsequent 
integration, is described by Fabry as follows: 


Laplace donne cette expression comme représentant la proba- 
bilité que la distance périhélie soit inférieure a q et la vitesse 
initiale inférieure a i/,/r. [p. 12] 


Fabry writes elsewhere that 


le nombre des cometes de vitesses compris entre v et v-+dv quise 
trouvent a l’intérieur d’une unité de volume située dans la région 
considérée de |’espace, vers la limite de la sphere d’activité du 
Soleil, peut étre représenté par y(v) du, y(v) étant une certaine 
fonction de v. [p. 8] 


Multiplication of (32) above by y(v) dv gives the number of cométes visibles 
having velocities between v and v+dv that may be found in a unit of volume 
in the region of space considered. Thus the number of cométes visibles with 
initial velocities between v1 and vo that will be found in a unit of volume 
is a 

/ Pr[(0< D<d|V =v] y(v)dv. 

The definition of U also occasions some difficulty. Supposing all values 
of V between (0) and some value U to be equally probable, Fabry points out 
that his y(v) will then be constant for all values of V less than U and zero 
for all values of V greater than U. Thus the total number of comets in the 
interior of each unit of volume, in the region of space considered, will be 


U 
/ p(v)dv = Uy, 
0 
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where y denotes the constant value of y(v). One thus obtains the ratio of 
the number of cométes visibles with velocities between v; and v9 contained 
in a unit of volume, to the total number of comets in that volume (y being 
assumed constant and the same in all regions of space situated towards the 
limit of the sphere of activity of the sun) as 


1 f°? es Cae ma ser 
il. h Pee ee (1+ d/r) — 2df du 


(the factor 5 being of course missing in Laplace’s formulation). Fabry’s 


result may thus be written as 


V9 U 
i Pel0<D<alV =ahoteyae / | y(v) dv . 


Notice next that 
y(v) dv = number of comets with velocities in (v, v + dv) . 


Thus 


y(v)dv  __—s number of comets with velocities in (v, v + dv) 
fe(v)dv — total number of comets 
Vv 


{| 


Prlu<V<vu+dv| 


= fy (v) dv ; 


where fy is a probability density function. Thus, as Fabry finds, 


V2 U 
| Polo <D<alV=sgteyae / | yp(v) dv 


U2 U 
Sy pelo <D<aivaalt ec / | edo 


=f Pro<D<dlV =a sv(v)ae 


=Prlf0< Ded & y <V < vv]. 
Thus, if y(v) is constant (= y), 


U 
fv(v) = o/ f pdv=1/U, 
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and 


Prf0<D<d& n<Vcul= / Prl[0< D<d|V=v]dv/U. 
Moreover, 


U 
Prf0<D<d =| Pee svat m 
0 


U U 
=| Peo D<div=ayots / viol a 


U 
ay Pr[(0< D<d|V=v]dv/U , 
0 
if p(v) = vy is constant. Thus 
Prius <V<vu.|0< D<d] 


= Prlv<V<u & 0< D<dij/Prlf0<D<d| 


rUD U 
=| Pefo<D<adlv=side/ f Pr(0< D<d|V=vldv. 
Ui 0 


This is all but Todhunter’s result: the only difference is in the limits of 
integration in the denominator. 

Returning to Laplace’s result, we note, from the quotation following 
equation (32) above, that what is wanted is 


/ Pr{(Q0< D<d|V=v]dv/U , 


v1 


which, by our preceding discussion, with constant y, is just 
Prl0<D<d & yu <V < v9]. 
Now 


PrfQ0< D<d & y <V < v9] 


=f Pr0<D<dlV=ufolwdv (33) 


v2 U 
=a) Polo <D<alv =a} e(ode / | veoh 
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where fy(v) = y(v) as y(v) dv. For constant y this becomes 


pre Dea’ a eV Su =i POSPeuy Slew od 


v2 
=;/ Pr(Q0< D<d|V=v]dv, 


which in fact follows from (33) on our supposing that V is uniformly dis- 
tributed over (0,U) — i.e. on our dividing the integral by the maximum 
value of V. This is just what Laplace advocates in the quotation following 
equation (32) above. 

Now to the limits v; and v9: for the lower limit Laplace takes that value 
of V that makes ,/r?v?(1 + d/r) — 2d zero, viz. \/2d/r(r + d) (in Fabry’s 
notation this becomes ,/2df/r(r+d) ). Denoting this lowest value by up 
and the upper limit by Vp, and defining z by 


2A diy de =y e dye ee ; (34) 


we have 


my 
II 


| c — Val A +d/r)— a dv 


u+(J/(1 — d/r)/r) [z/2 — 2\/2df arctan (z/./2df) — df/z| +c 


Now the integral being zero for v = vo, or z = W/2df, we have 


c= —up + 2./2df (Vi—d/r/r) m/A, 


and hence 
I=v+ vine [2/2 — 2\/2df arctan (z//2df) _ df /2| 


- See ear vane oe (35) 


The value of J between the limits vg and Vo is then easily obtained. 

As can be seen from the above manipulation, the upper limit for z is 
a complicated function of Vo. Laplace thus proposes a series solution, for 
which he lets i = Vo./r. The upper limit 


z=ivr /14d/r 1 — J — df /Pr(1 + d/r)| (36) 


is then developed by Laplace (Fabry’s notation) as 
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substitution of which in (35) yields 


a G 7 1) i ve : a) 


2 
Fabry then shows that 


1 


Vo 
au |, Prl[Q< D<d|V =v] dv 


8) 


_1[(w=2)v%F af 
i eer eek oo) 


Laplace’s value being twice this with f = 1. 
Notice next that substitution of i = v/r in 1/a = 2r — v?/f yields 


Lia (QPS) p gr. 


The orbit is thus elliptic or parabolic according as i? < 2f or 2? > 2f. 
Supposing, for example, that a = —100, or —100R in Fabry’s notation, 
where £ is the radius of the terrestrial orbit, we have 


i? = (200R+r)f/100R. 


At this stage Fabry’s explanation errs a little: he starts to interpret prob- 
abilities as numbers — e.g. the expression (37) is referred to as 


le nombre des orbites dont la distance périhélie est inférieure a 
q [= d] et qui sont elliptiques, paraboliques, ou hyperboliques 
avec un demi-grand axe supérieur a 100R en valeur absolue. 


[p. 13] 


It therefore seems better to follow Laplace’s original here. 

Denoting by A the event that the orbit is elliptic, parabolic or hyperbolic 
with semi-major axis at least equal to 100 (or 100A “en valeur absolue” , 
Fabry [1893-1895, p. 13]) we find, with i* as above, 


_ 1 | (a - 2)./2df L0d/f 
PrlQ<D<d& SPREE ee 


2 
Denoting by #’ = U,/r the value of i corresponding to the upper limit U 
of the velocity, we have, again from (37) 


L[(m-2)V2F df 
[Se - |. 


Pr[0< D<dj= 


Thus 
Pr[Q0<D<d& A] = Prlf0<D<d/—-Prf0<D<d& Al 


10d./F df 
UrJr(QORE)/R VUrve 


) 


i 
2 
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A denoting the event that the comets are “sensiblement hyperboliques” 
(Fabry (1893-1895, p. 13]). Thus 


— 


Prl[0<D<d& Al/Pr[0<D<dk& Al 
10d,/f _ 


(7 — 2) /2df 10d./f 
2 «(QDR +1)/R JrQ00R+r)/R tr 

an expression that of course depends on the value of U through 7’. Letting 

i’ (and therefore U) be infinite, we obtain 
Pr[0<D<d& A] — (7-2) a 
Pr(0<D<d&Al| 10 V2d 


r 
200 + =) fl, i030) 


or, in Laplace’s words, 


ainsi la distance périhélie étant supposée comprise entre zéro et 
D, la probabilité que l’orbe sera ou une ellipse, ou une parabole, 
ou une hyperbole d’un demi-grand axe au moins égal a 100, est 
a la probabilité qu’il sera une hyperbole d’une demi-grand axe 
inférieur, comme 


ir 2) [Er + 200) - Ved 
[p. 95]. 


A numerical example now follows: taking d = 2R (“la limite des distances 
périhélies des cométes que nous pouvons voir” — Fabry [1893-1895, p. 14]) 
and r = 10°R, (39) yields the value 5712.668. As Laplace says, 


il y a donc a fort peu pres cinquante-six a parier contre l’unité 
que, sur cent orbes cométaires observables, aucun ne doit étre 
une hyperbole d’un demi-grand axe inférieur a 100. [p. 95] 


After noting that the preceding analysis supposes all values of D be- 
tween 0 and 2 to be equally possible (for all comets that one can perceive), 
Laplace points out that comets with perihelion distance greater than one 
are much less numerous than those with this distance less than one. He 
next attempts to prove that the probability of sensibly hyperbolic comets 
is further diminished by this fact. Although the examination of this case is 
difficult, Todhunter [1865] dismisses it all with the words “he proceeds to 
consider how this will modify his result” [p. 494]: we propose to be some- 
what more explanatory here”!. 

The introduction of y(D) — rather than 1 — as the prior on D is treated 
as in the 28th article of Laplace’s Mémoire sur les probabilités of 1778. The 
method used here consists in differentiating both numerator and denomi- 
nator with respect to D, multiplying each of the resulting expressions by 


y(D), and considering the ratio of the two resulting expressions®?. 
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Laplace thus proceeds as follows: note firstly that 


_ (w— 2)V2d d (7 — 2)/2d 
Re 2Ur iUryr SOU 


(40) 


as 2 — oo, and also that 
Pr[O<D<d & a>rjQe7)| 
=Pr[(0< D<d]—Pr[0<D<d & a<r/(2-i*)| 


=Pr[0<D<dj—-Prl0<D<d& V?<?/r] 


_ (nm —2)V2d (1 — 2)/2d d 
7 2Ur 2Ur Ur./r 


d 
The expressions (40) and (41) are then each differentiated with respect to 
d, each multiplied by y(d), and the products then integrated, their ratio 


giving 
Pr[a>r/(2—2°)|0<D<d| 


=Pr[0<D<dka>r/(2-%*)|/Prlf0<D<d 


in the case in which ¢y(d) is not necessarily identical to one. (This is in 
fact the procedure set out in Article 28 of Laplace’s Mémoire sur les prob- 
abtlités. ) 

Laplace checks that this ratio yields his previously derived result in 
the case in which y(d) = 1, and also considers the prior density y(d) = 
k exp(—d?), an assumption®® that he finds supported by empirical data. In 
this case it transpires that 


il y a donc alors, a fort peu prés, 8263 a parier contre |’unité 
qu’une nébuleuse qui pénétre dans la sphere d’activité du Soleil 
décrira un orbe dont le demi-grand axe sera au moins égal a 
100. [p. 97] 


The final conclusion is 


ainsi l’on peut regarder la supposition de y(D) constant, et ne 
s’étendant que jusqu’éa D = 2, comme la limite des suppositions 
favorables aux mouvements hyperboliques sensibles, en sorte 
qu’il y aau moins 56 a parier contre l’unité que, sur cent comeétes 
observables, aucune n’aura un semblable mouvement. [p. 97] 
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Comments on Laplace’s results. 
Students of celestial mechanics were not slow to realize that certain lacunae 
were evident in Laplace’s exposition. Perhaps the first of those to point out 
the errors was Gauss, who devoted a major part of his review of 1815 of 
the appropriate volume of Connaissance des temps, to this memoir. Others 
who commented critically were Schiaparelli, Seeliger and Fabry**. 

There are essentially two points to which exception may be taken: the 
first of these is concerned with the lower limit v, (as given in the integral 
in equation (33)), while the other concerns the series expansion 


c= Bli-S (1-4) +--] (42) 


(connected with this is the assumption that 7’ — or U — tends to infinity). 
We shall treat these matters seriatim, following Fabry in the main®®. 
Firstly, in the integral 


i h — Vial A +d/r) — a dv 


we have seen that Laplace takes as lower limit that value v; that makes 
Vr2v2(1+d/r) — 2df zero, that is, vx] = \/2df/r(r +d). Now there seems 
no reason for omitting smaller, positive values of V, though of course their 
inclusion will necessitate the addition of a term fe *.+-dv (for an appropri- 
ate integrand, of which we shall say more anon) to the extant ae Indeed, 
the perihelion distance is always less than d, no matter what the angle @ 
may be, and these values of V should therefore not be left aside. This lapse, 
charitably described as an “Ueberheilung” by Gauss [1874, p. 582], is not 
as serious as it might at first sight appear to be: the velocities omitted 
correspond always to elliptic orbits?®, and their inclusion thus serves but 
to strengthen Laplace’s conclusion — indeed, Gauss shows that the odds of 
56 : 1 found by Laplace for the observing, among 100 comets, of one that 
does not have a “sensibly hyperbolic orbit”, are in fact raised to 157 : 1. 

Secondly, let us pass on to the infinite series and the approximation used 
by Laplace. That something is indeed amiss is evident on our noting that, 
for any finite a, 


i F — vin AA + d/r) — 2df | dv 


is infinite?”, while Laplace’s evaluation of this integral, obtained by a series 
expansion and a limiting process, yields a finite quantity. 

In his discussion of this point Fabry points out that Laplace’s develop- 
ment is correct when 7 is a quantity of moderate size, or more precisely 
when it is of the same order as \/f. Under this assumption Fabry shows 


242 7 Laplace 


that the formulae (35) and (36) above lead to 
df 
=o 
i/r 


which is identical to (42), the terms neglected in the brackets being at 
least (d/r)*. On substituting this value for z, and V, (or i/,/r), for v in 


(35) Fabry obtains 
/2d d 
i (=-1)-- i (44) 
ir./r 
as did Laplace. Fabry [1893-1895] emphasizes that the terms neglected here 
are at least of order (d/r)?: 


[1 — (d/2r) (1 — f /#)] (43) 


la formule [(44)] peut donc bien remplacer la formule [(35)] dans 
le cas ot! 2 est une quantité finie de grandeur modérée (du méme 


ordre que \/f). [p. 19] 


The formula (39) above was derived under the assumption that U — co 
(and hence that i’ — oo). In this case, however, certain of the terms ne- 
glected above become infinite, and these terms are no longer negligible with 
respect to the terms conserved that remain finite. If one continues the in- 
finite series further than Laplace did one finds terms that become infinite 
for i infinite, and in this case (44) ceases to be equal to (35). 

Moreover, as Gauss noted, the assumption that all values of the veloci- 
ties are equally probable over [0, co) is inadmissible®®, since this leaves an 
infinitely small probability for each finite velocity. This implies that orbits 
that are nearly parabolic will be infinitely less probable, and all the prob- 
ability will on the contrary be placed on orbits that are indistinguishable 
from straight lines and will be traversed with infinite velocity®?. 

Fabry notes that, even in the case in which U 1s infinite, the series expan- 
sion requires rectification: the velocities of celestial bodies being of the same 
order of magnitude as the velocity of the earth in its orbit (which value 
is \/f/R), if U and V are magnitudes of this order, 7 is of order /fr/R. 
Fabry proposes then [p. 20] to repeat his earlier development, keeping V 
in the calculation and carrying the expansions further. He shows that, on 
neglecting terms of order 1/r® and higher [p. 22], (35) becomes 


AEE N(i-Z) HE (r-¥)- wo 


(He also verifies carefully [p. 22] that the terms neglected in this latter ex- 
pression are really negligible.) It is worth noting that if V becomes infinite 
so does (45), which is in accord with what we have already said. 
Introducing these two corrections into Laplace’s discussion!°° and ex- 
panding the series appropriately, Fabry shows that (35) becomes!®', on 
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neglecting terms of order 1/r?, 


wee (1 +) +55 ( of). (46) 


r 2\ OF 9r2\ Vd 


Thus the ratio of the number of comets that are not sensibly hyperbolic to 
the number of those that are is 


JTdF(n/2)(2r — d) +d? [V — (2f/Va)] 
#0 —V — (2f/d) (1/0 - 1/V)] 


Here U is the (hypothesized) largest value of the velocities, and V is the 
velocity corresponding to the semi-major axis —100R (or to whatever other 
semi-major axils we choose to separate the orbits of comets that are sensibly 
hyperbolic from those that are not). 

Fabry now discusses a numerical example showing that comets with sen- 
sibly hyperbolic orbits ought to be exceedingly rare, as Laplace in fact 
asserted!9?. Much of Fabry’s monograph is in fact devoted to the study of 
this question, taking into account complicating factors such as the move- 
ment of the sun in space and comets near to the sun: we shall not pursue 
the matter further here!°°. 

Let us conclude this discussion with two pertinent quotations. The first 


is from Gauss [1874]: 


wenn inzwischen die Wahrscheinlichkeitsrechnung auch gleich 
keinen entscheidenden Beweis fur die Hypothese liefern kann, 
so entscheidet sie doch, eben wegen unsrer Unwissenheit uber 
die Grenze U, auch durchaus nichts gegen die Hypothese. 

[p. 583] 


The second is from Schiaparelli [1874]: 


en 1813 les astronomes n’avaient pas beaucoup de confiance 
dans les spéculations de W. Herschel sur le mouvement pro- 
pre du systeme solaire: on pouvait donc raisonnablement en 
exclure la considération. Cela n’est plus permis aujourd’hui. En 
reprenant donc le probleme sous le point de vue de Laplace, 
mais avec la supposition que le systeme solaire se transporte 
dans l’espace avec la vitesse u comparable a celles des planétes, 
on trouvera non seulement un tres grand exces de probabilité 
en faveur des orbites fortement hyperboliques, mais on verra 
de plus, que les hyperboles dont |’axe approche de la quantité 
~—1/u? doivent étre plus fréquentes que toutes les autres. Cela 
étant contraire a l’observation, il faut conclure que les comeétes 
ne sont point des corps de nature stellaire. [p. 80] 
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7.14 Two memoirs 


In the Connaissance des Temps for 1818, printed in 1815, Laplace published 
two articles bearing on our subject, viz. Sur l’application du calcul des prob- 
abilités a la philosophie naturelle and Sur le calcul des probabilités appliqué 
a la philosophie naturelle. The material of these papers being reproduced 
in the first Supplement to the Théorie analytique des probabilités (and part 
of the first paper being repeated in the introduction to that book), we shall 
postpone consideration of the contents until we discuss the latter work!°*. 
Indeed, there seems little in the two memoirs, as originally printed, that is 
pertinent. 


7.15 Théorie analytique des probabilités 


7.15.1 Introduction 


The introduction to the Théorte analytique des probabilités'°®, published 
separately under the title Essai philosophique sur les probabilités, is a 
much expanded version of a Legon on probabilities delivered by Laplace 
at the Ecoles Normales in 1795 under the title Sur les probabilités!°®. A 
sketch of certain passages in the Essaz appeared in Laplace’s Notice sur les 
probabilités!°” of 1810, and the Essai itself underwent drastic changes at 
Laplace’s hands, from the first edition of 1814 to the fifth edition of 1825. 
It is to this last edition, the last to appear before Laplace’s death, that 
attention will be paid here!®. 

Seven general principles of probability are given in the section of the 
Essat entitled “Principes généraux du Calcul des Probabilités”. The sixth 
and seventh of these are particularly pertinent to the present work, and 
are accordingly given below (all page references are to the Thom/Bru 1986 
edition of the Essaz). 


VI. Chacune des causes, auxquelles un événement observé peut 
étre attribué, est indiquée avec d’autant plus de vraisemblance, 
qu’il est plus probable que, cette cause étant supposée exister, 
V’événement aura lieu; la probabilité de |’existence d’une quel- 
conque de ces causes est donc une fraction dont le numérateur 
est la probabilité de l’événement, résultante de cette cause, et 
dont le dénominateur est la somme des probabilités semblables 
relatives a toutes les causes: si ces diverses causes considérées 
a priori sont inégalement probables, il faut au lieu de la prob- 
abilité de Vévénement, résultante de chaque cause, employer 
le produit de cette probabilité, par la possibilité de la cause 
elle-méme. C’est le principe fondamental de cette branche de 
Analyse des hasards, qui consiste a remonter des événements 
aux causes. [p. 42] 
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VIL. La probabilité d’un événement futur est la somme des pro- 
duits de la probabilité de chaque cause, tirée de |’événement ob- 
servé, par la probabilité que, cette cause existant, |’événement 
futur aura lieu. [p. 44] 


Symbolically the two parts of the sixth principle can be expressed as 
Pr(E | #;] > Pr[E | 4;] > Pr[A; | £] > Pr (A; | £] 


Pr [Hi | £) = Pr[E'| Hil/) Pr lk | Aj], 


both of which are true if Pr[H;] = Pr[H;] for all 2 and 7. More generally, 
of course, 
Pr[#; | 2). Pr|[z |:Ay| Pri Bf) Pete | 4;) Prey). 


J 


Laplace also notes that “Ce principe donne la raison pour laquelle on at- 
tribue les événements réguliers 4 une cause particuliére” [p. 42]. Pearson 
[1978, p. 658] asserts that Laplace took this Principle without acknowledge- 
ment from Condorcet, who had in turn developed it from Bayes. While it 
is true that this result is not in Bayes’s Essay, it is to be found in Laplace’s 
Mémoire sur la probabilité des causes par les événements of 1774, a memoir 
that seems to antedate any pertinent writings published by Condorcet. 
Principle VII may be symbolized as 


Pr{B2 | £i] = dP [H; | £1] Pr[E2 | A;], 


which is true if we assume the conditional independence of &; and Fy with 
respect to {H;}. (This formulation is supported by the example given by 
Laplace following its presentation.) 

The seventh principle is followed by a discussion of the case in which 
the probability of the simple event is unknown. The suggestion in this case 
is to suppose all values from zero to one equally probable. The pertinent 
passage is perhaps a little confusing, and since it embodies a discrete form 
of Bayes’s Theorem, we quote it here in full: 


Quand la probabilité d’un événement simple est inconnue, on 
peut lui supposer également toutes les valeurs depuis zéro jusqu’a 
VPunité. La probabilité de chacune de ces hypotheses, tirée de 
l’événement observé, est par le sixiéme principe une fraction 
dont le numérateur est la probabilité de l’événement dans cette 
hypotheése, et dont le dénominateur est la somme des proba- 
bilités semblables relatives a toutes les hypothéses. Ainsi, la 
probabilité que la possibilité de l’événement est comprise dans 
des limites données, est la somme des fractions comprises dans 


246 7 Laplace 


ces limites. Maintenant, si l’on multiplie chaque fraction par la 
probabilité de ’événement futur, déterminée dans l’hypothése 
correspondante, la somme des produits relatifs a toutes les hy- 
pothéses sera par le septieme principe la probabilité de l’événe- 
ment futur, tirée de l’événement observé. [p. 45] 


In modern terms this may be written as follows: let # denote the ini- 
tial single event, O the observed event, F’ the future event, and H; the 
hypothesis Pr[E] = p;, where p; € [0,1] andi € {1,2,... ,n}. Then 


Pr [H; | O] = Pr[0 | Bile 2alQ | 15] 


and 


y~ Pr [Hi | O] 


| 


Pr[z < pj < x’ | O] 


s a ee 


where )~ indicates that the summation is taken over all p; € (x, x’). Fur- 
thermore, 


Ps eet era] PO ho PEO ea): 


It is assumed here, of course, that Pr[H;] = Pr[H;], and that F and O are 
independent with respect to {H;}. . 

From this the rule of succession follows. Laplace applies this principle to 
the problem of the sun’s rising, and points out that his solution is different 
from Buffon’s!®?, since 


la vraie maniére de remonter des événements passés a la proba- 
bilité des causes et des événements futurs, était inconnue a cet 
illustre écrivain. [p. 46] 


(See §5.8 of the present work for a discussion of Buffon’s result*?°.) 

There is nothing more that is pertinent to our present study until we 
reach the eleventh section, entitled “De la probabilité des témoignages” , 
in which Laplace applies his earlier principle on the probability of causes 
elicited from observed events!1!. Here Principle VI is used to estimate the 
veracity of a witness, as shown in the following example: suppose that a 
number has been drawn [at random] from an urn containing 1000 numbers 
[assumed distinct]. A witness to the drawing announces that the number 
79 was drawn: what is the probability that he tells the truth? Let E denote 
the event that it is announced that the number drawn is 79: then 


Pr [FE] = Pr [EF | witness lies] Pr [witness lies] 
+ Pr[E | witness tells the truth] Pr [witness tells the truth]. (47) 
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Now from prior experience it is known that 


Pr [witness lies] = = = 1 — Pr[witness tells the truth]. 


Furthermore, 
1 
Pr[£ | witness tells the truth] = Pr [79 drawn] = To00 
and 
Pr [E | witness lies} = Pr[# | 79 not drawn] x Pr [79 not drawn] 


1 999 _ 1 
999 ~ 1000 ~ 1000’ 


(These probabilities are stated to be determined a priori. Note the tacit 


equiprobability assumption.) Thus, finally???, 


Pr (witness tells the truth | £] 


= Pr [EF | witness tells the truth] x Pr [witness tells the truth] / Pr [£] 


Off Dg TS 
~ 10000 10000 10000 


ae 
Sari 
(Similarly for falsehood.) Laplace also considers the case in which the wit- 


ness has an interest in the number drawn, and discusses how this will affect 
the final result. 


Le bon sens nous dicte que cet intérét doit inspirer de la défiance, 
mais le calcul en apprécie |’influence. [p. 120] 


A further example, in similar vein, concerning an urn containing one 


white and 999 black balls shows that 


la probabilité de l’erreur ou du mensonge du témoin devient 
d’autant plus grande, que le fait attesté est plus extraordinaire 
[p. 122}. 


In the first edition of his A System of Logic Ratiocinative and Inductive of 
1843 John Stuart Mill claims that these two examples showed that Laplace 
confused two meanings of “improbability”: 


If, says Laplace, there are one thousand tickets in a box, and 
one only has been drawn out; then if an eye-witness affirms that 
the number drawn was 79, this, though the chances were 999 
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in 1000 against it, is not incredible, because the chances were 
equally great against every other number. But (he continues) if 
there are in the box 999 black balls and only one white, and the 
witness affirms that the white ball was drawn, this is incredible; 
because there was but one chance in favour of white, and 999 
in favour of some black ball. 
This appears to me entirely fallacious. 
[Book HI, Chap. XXV, §5.] 


Mall claims further that the two assertions ought to carry the same amount 
of credibility, a statement that he justifies by supposing that the balls are 
numbered, the white ball bearing the number 79. 


Then the drawing of the white ball, and the drawing of No. 79, 
are the very same event; how then can the one be credible, the 
other absolutely incredible? [loc. cit.] 


The error, Mill goes on to say, “is founded upon a misapplication ... of 
[Laplace’s] own sixth theorem of the doctrine of chances” (loc. cit.): indeed, 
according to Mill Laplace reasoned from this theorem that 


in the case of the thousand tickets, the cause mendacity might 
produce any one of 999 untrue statements, while in the case of 
the balls, there being only two statements to make, viz. white 
or black, and one of these being true, the cause mendacity could 
only produce one untrue statement: and consequently (the an- 
tecedent probability of mendacity from the character of the wit- 
ness being supposed the same in both cases) mendacity was 999 
times less likely to have produced the particular assertion made, 
and is therefore 999 times less likely to have existed, in the for- 
mer case than in the latter. 

The error of this argument seems to be... that of applying a 
theorem, only true of the degrees of probability of causes, to the 
probability of what are neither causes, nor in any way specially 
connected with the effect. floc. cit.] 


However, in the second edition of 1846 of his A System of Logic Ratioci- 
native and Inductive Mill partly recanted, writing 


This argument of Laplace’s, though I formerly thought it falla- 
cious, is irrefragable in the case which he supposes, and in all 
others which that case fairly represents. But I do not think his 
case a perfect representative of all cases of coincidence. 


[Book III, Chap. XXV, §6.] 


This strength of this retraction was somewhat diluted in later editions. 

In another example Laplace admits the possibility of the witness’s being 
mistaken: in this case the two hypotheses of the earlier example are replaced 
by the following four!??: 
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(i) the witness neither lies nor is deceived; 
(ii) the witness does not lie but is deceived; 
(11) the witness lies and is not deceived; 
(iv) the witness lies and is deceived. 

And a further example shows that 


une conséquence impossible est la limite des conséquences ex- 
traordinaires, comme |’erreur est la limite des invraisemblances; 
la valeur des temoignages, qui devient nulle dans le cas d’une 
conséquence impossible, doit donc étre trés affaiblie dans celui 
d’une conséquence extraordinaire. [p. 123] 


Writing of these examples Zabell [1988a] says 


Laplace’s analysis was initially faulted by both Mill and Venn, 
each of whom in later editions of their books grudgingly con- 
ceded that Laplace’s analysis is correct in the circumstances he 
posits. [p. 179] 


In the thirteenth section, entitled “De la probabilité des jugements des 
tribunaux”, Laplace uses a Bayes-type argument. It will be more conve- 
nient, however, to postpone discussion of this point until consideration of 
the First Supplement to the Théorie analytique des probabilités. 

In the final section of the Lssaz we find the only reference to Bayes, viz. 


Bayes, dans les Transactions Philosophiques de |’année 1763, a 
cherché directement la probabilité que les possibilités indiquées 
par des expériences déja faites sont comprises dans les limites 
données, et il y est parvenu d’une maniere fine et trés ingénieuse, 
quoiqu’un peu embarrassée. Cet objet se rattache a la théorie 
de la probabilité des causes et des événements futurs, conclue 
des événements observés, théorie dont j’exposai quelques années 
apres les principes, avec la remarque de |’influence des inégalités 
qui peuvent exister entre les chances que l’on suppose égales. 
Quoique l’on ignore quels sont les événements simples que ces 
inégalites favorisent, cependant cette ignorance méme accroit 
souvent la probabilité des événements composés. [pp. 200-201] 


Finally, as a summary of the Essaz, let us note the following sentence 
from the last paragraph: 


on voit par cet Essai que la théorie des probabilités n’est au fond 
que le bon sens réduit au calcul: elle fait apprécier avec exacti- 
tude, ce que les esprits justes sentent par une sorte d’instinct, 
sans qu’ils puissent souvent s’en rendre compte. [p. 206] 
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7.15.2 Livre 1: Calcul des fonctions génératrices 


This, the first of the two Books into which the Théorie analytique des 
probabilités'*4 is divided, does not contain anything pertinent to our present 
topic. It is devoted to a study of generating functions and (in the second 


part) certain approximations of functions of large numbers?!°. 


7.15.38 Livre 2: Théorie générale des probabilités 


In the first chapter, entitled “Principes généraux de cette theorie”, Laplace 
presents!!® in both words and mathematical symbols, those “principes 
généraux de |’Analyse des Probabilités” [p. 190] of which he had already 
written in the Essaz. 

He begins by stating his usual definition of the probability of an event, 
vig*! 


la probabilité d’un événement est le rapport du nombre des 
cas qui lui sont favorables au nombre de tous les cas possibles, 
lorsque rien ne porte a croire que |’un de ces cas doit arriver 
plutot que les autres, ce qui les rend, pour nous, également 
possibles [p. 181], 


and he goes on to say 


la juste appréciation de ces cas divers est un des points les plus 
délicats de l’Analyse des hasards. [p. 181] 


Cognisance is also taken of the situation in which the cases are not equally 
possible: 


si tous les cas ne sont pas également possibles, on déterminera 
leurs possibilités respectives, et alors la probabilité de l’événement 
sera la somme des probabilités de chaque cas favorable. [p. 181] 


These definitions are then expressed in mathematical symbols. 
Laplace next passes to the question of independence, pointing out that 
if {Fj} is a sequence of independent simple events, with p; = Pr [Ki], 


nn 
then Pr(£, E...En| = [| pi. Attention is then turned to dependence, and 
1 


Laplace states that in the case of two simple events where the supposition 
of the occurrence of the first (£1) affects the probability of the occurrence 
of the second (E22), we have Pr[Ey | = Pr [£2 | £:] Pr[F4]. 

Laplace also provides an explanation of the a priori determination of 
probability: he says that the probability of an event is determined a priori 
“ou indépendamment de ce qui est déja arrivé” [p. 183]. He thus deduces 
what he describes as “ce nouveau principe” [p. 183], that for any future 
event Ey depending on an observed event #y, (and £;, E2 need not nec- 
essarily be simple), Pr[#, | £1] = Pr [Ei £2] /Pr[£i], where each term on 
the right-hand side is determined a prior. 
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One might call this a prospective use of conditional probability; how- 
ever, immediately after stating this principle Laplace goes on to frame a 
retrospective definition!+®: he writes 


de la découle encore cet autre principe relatif a la probabilité 
des causes, tirée des événements observés [p. 183], 


and he then formulates the following expressions: for a sequence {H;} of 
causes and an event LF, 


(i) Pr[Hj]: Pr(Aj] 3: Pr[E | Wi): Pr[E| aj], i#3 
(ii) Pr[a;| £])= Pr[é | Hilf Pr [E | H;]. 


No mention of a priori equipossibility is initially made, though it is ex- 

plicitly stated when the translation into symbols is effected, and the corre- 

sponding formula for the non-equipossible case is also given [p. 184]. 
Following an example, Laplace states the following principle: 


la probabilité d’un événement futur est la somme des produits 
de la probabilité de chaque cause, tirée de |’événement observe, 
par la probabilité que, cette cause existant, l’événement futur 
aura lieu [p. 186], 


orl! Pr[F, | Fi] = yoy Pr [Hy | Fi] Pr[E | 45). 

Consideration is given to the question of obtaining two heads (say) in 
two tosses of a coin known to be biased, though whether in favour of heads 
or tails is unknown. The first toss of this coin will yield heads still with 
probability T though the probability of the result of the second toss is 
modified. This procedure is then extended to any events whatsoever 
[pp. 188-189]. 

Finally, there 1s some discussion of moral and mathematical expectation, 
with the usual definition of the latter being given [p. 189]. 

In Chapter II, “De la probabilité des événements composés d’événements 
simples dont les possibilités respectives sont données”, Laplace contrasts 
[p. 193] “l’espérance mathématique” with “crainte”: thus the translation 
of the former as “mathematical hope” rather than “mathematical expec- 
tation” is not altogether something to be avoided. 

Perhaps the only other point worth noting here is that Laplace shows 
[pp. 278-279] that the appropriate law to describe a distribution of errors is 
y = aexp(—2az), under the assumptions that the law is initially unknown, 
that the probabilities tend to zero as the errors increase in absolute value, 
and that the probability of an error of +¢ is equal to that of —e. 

Chapter III, “Des lois de la probabilité qui résultent de la multiplication 
indéfinie des événements” , is devoted to a proof of Bernoulli’s Theorem and 
certain examples using this result — a result of which Laplace writes 
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la détermination de ces accroissements et de ces limites est une 
des parties les plus intéressantes et les plus délicates de l’analyse 
des hasards. [p. 280] 


Laplace shows firstly that, if p and 1 — p are the respective probabilities 
of two events A and B, then in a very large number of trials (“coups”), 


(1) the most probable of all combinations that can arise is that in which 
each event is repeated proportionally to its probability, and 


(11) the probability that the difference between the ratio of the number 
of times that the event A can occur to the total number of trials, and 
the “facilité” p of that event, lies between the limits 


r—np . tV2exex' 
ee 
n nJf/n 


ey) é 2 Jn 2 
— ae | ee i a 48 
—- | a V2rxx! : (48) 


where n is the total number of trials, in which A and B occurred z and 2’ 


times respectively, tf = Ln /V2ex", and / is that term in the expansion of 
[p+ (1—p)]*t® that contains p*~'(1— p)* +. (The approximation (48) is 
derived under the assumptions that terms of order |/n are neglected, and 
that /? does not exceed n in order of magnitude.) 

Although I do not intend to prove the above result here, it is perhaps 
not inadvisable briefly to outline some steps in Laplace’s procedure. To this 
end, suppose that p and g = 1—pare the (initial) probabilities of the events 
A and B respectively. Denoting by X the random variable indicating the 
number of times A occurs in n trials, we require Pr [lan < X < Bp | n, pl. 

Now the probability that A and B occur x and z' = n— 2 times is 


Grae —p)”, 


the greatest value of which is achieved when p:1—p:: 2: x’. Laplace then 
shows that Pr[e —r< X <2x+r|n,p| is approximately 


t 

=z | e~“ dut _ ae (49) 

that is, the approximate value of the sum of (2r+1) terms of the expansion 

of [p+(1—p)]", the greatest term in this expansion being the middle term 

in the (Qr+ 1) terms. Writing s = np+zandt= rJn | /2ex', we see 
that (49) gives the probability 


Pr Inp+2—tV2aa' /Vn< X <np+z+tv2e0'/ Jnl (50) 


1S 
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Pr [2/n—tV2ex! /n Vas X/n—p<2/n+tV2a0'/nval. (51) 


This is the result given in (il) above, and is as far as Laplace takes the direct 
result, although some remarks not pertinent to our present investigations 
follow. 

Both Keynes [1921, chap. 30] and Todhunter [1865, art. 993], however, 
carry the argument further. Supposing that for large n, z may be ignored 
in comparison with np, we have xz’! & n’pq. In this case the probability 
(51) becomes 


Pr |—t/2pq [Vn < X/n—p <t /2pq/ Val, (52) 


this being approximately given by 


2 [ 2 1 2 
—= | e " du+—=—=—e". 53 
VT Jo V2mnpg oo 
This latter expression is in fact the only one given by Keynes (loc. cit.), 
who makes no mention of the expressions (49) and (50), the only ones given 
by Laplace. 


Laplace proceeds next to an inverse form of the theorem!?° 


, writing 


si l’on connait le nombre de fois que sur n coups l’événement 
a [= A] est arrivé, le formule (0) [= (48)] donnera la proba- 
bilité que sa facilité p, supposée inconnue, sera compris dans 
des limites données. [p. 286] 


This is shown as follows: denoting by 7 the number of times that A occurs 
in the n trials, Laplace states that his preceding result gives the probability 
that z/n — p will be contained within the limits 


z  TvV2ex! 

pa 

n n/n 
where T' is the limit of t. Since T 2x2" /n/n is of order 1/./n, and since 


terms of order 1/n are being neglected in deriving the approximations, one 
may substitute 7 for « and n —7 for x’, with the result that the limits in 


(54) become 
i Pveten (55) 


(54) 


a 


1 7 


It thus follows that the probability that the “facilité” p of A lies within 
these limits is given by 


T : 7 
= | e“ dut+ A 
VT Jo J 2ri(n — 1) 


(56) 
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From this Laplace concludes that, as n increases, the interval of the limits 
contracts, and the probability that p falls within these limits approaches 1. 
“C’est ainsi que les événements, en se développant, font connaitre leurs 
probabilités respectives” [p. 287]. 

However, Laplace’s discussion does not end here. He proposes an alter- 
native method (for trenchant criticism of which see Keynes [1921, chap. 
30]) with the words 


on parvient directement a ces résultats, en considérant p comme 
une variable qui peut s’étendre depuis zéro jusqu’a |’unité, et en 
déterminant, d’apres les événements observés, la probabilité de 
ses diverses valeurs, comme on le verra lorsque nous traiterons 
de la probabilité des causes déduite des événements observés. 
[p. 287] 


This alternative procedure is explored in Chapter VI: we shall postpone 
discussion of it for the nonce. 

The rest of the present chapter contains applications of this result to 
various urn problems, and consideration is also given (1) to the result ob- 
tained when more than two events are considered, and (ii) to the case in 
which the probability p is replaced by some specific function f(p). 

Chapter IV, “De la probabilité des erreurs des résultats moyens d’un 
grand nombre d’observations et des résultats moyens les plus avantageux” , 
contains the development of least squares theory. Writing of this chapter 
Todhunter [1865] calls it “the most important in Laplace’s work, and per- 
haps the most difficult” [art. 1001], and he goes on to say 


Laplace’s processes in this Chapter are very peculiar, and it is 
scarcely possible to understand them or feel any confidence in 
their results without translating them into more usual mathe- 
matical language. [art. 1001] 


Happily only the last two articles (numbers 23 and 24) of this chapter are 
at all pertinent to the present work. 

In Article 23 Laplace proposes to switch from the consideration of obser- 
vations not yet made, to the consideration of the mean result of observa- 
tions already made, whose respective deviations (“écarts”) are known'??. 
Consider s observations, with results A,A+q,A+q™,..., with the same 
law of errors (here gq, g)... may, without loss of generality, be assumed 
positive and increasing). If A+ z is the true result, the errors of the first, 
second, third ... observations are then —z,q — 2, gi) —2z,... . Denoting 
by y(z) the probability of the error z (the same for each observation), we 
see that the probability of the simultaneous existence of all the errors 1s 


p(—2) p(q — 2) o(q'? — 2)... 


In considering the infinity of values of which z is supposed susceptible as the 
causes of the observed event, we find, from Article 1, that “la probabilité de 
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chacune d’elles sera” [p. 339] or, as Todhunter [1865] has it, “the probability 
that the true value lies between z and z + dz” [art. 1013] is!?? 


(2) o(a-2)...d2 | f (2) o(q-2)...de, 


the integral being taken over all values of z. Denote the denominator by 
1/H. 
Now consider a curve with abscissa x and ordinate 


y= Ho(—z) p(q—- 2)... 
Laplace states quite baldly that 


la valeur qu’il faut choisir pour résultat moyen est celle qui rend 
Yerreur moyenne 4 craindre un minimum [p. 339], 


and he goes on to say that the mean value of the error to be apprehended 
(“erreur a craindre” ) is the sum of the products of each error, all regarded 
as positive, by its probability. To determine the abscissa necessary to be 
chosen to minimize this sum, Laplace places a new origin at the left-hand 
end (“la premiere extrémité” ) of the curve, with co-ordinates now denoted 
by 2’ and y’. If l is the value to be chosen then (see the discussion in §7.3 


above) | . 
max 2 
/ de! = | y dz’ . 
0 


Il suit de la que l’abscisse qui rend |’erreur moyenne a craindre 
un minimum est celle dont l’ordonée divise l’aire de la courbe 
en deux parties égales. [p. 340] 


This number is called the milteu de probabilité. Contrasting his value with 
that given by earlier mathematicians, Laplace writes 


des géometres célebres ont pris pour le milieu qu’il faut choisir 
celui qui rend le résultat observé le plus probable, et par consé- 
quent lVabscisse qui répond a la plus grande ordonnée de la 
courbe; mais le milieu que nous adoptons est évidemment in- 
diqué par la théorie des probabilités. [p. 340] 


Supposing next that p(x) = exp(—y(x?)) (ie. assuming only that pos- 
itive and negative errors are equally likely), one has 


y = Hexp(—¥(2”) ~ o(e — 9)? — oe — g)? - ) 


Laplace then shows that, taking the “average of the results furnished by 
observations as the most probable result” [Todhunter 1865, art. 1014], y is 


necessarily given by 
y= k ea kn* 
Va 
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(where & is constant). Laplace also notes the equivalence of the above as- 
sumption with that of the method of least squares, with the words 


cette valeur [de x qu’il faut choisir pour résultat moyen des ob- 
servations] est celle que donne la régle des milieux arithmétiques; 
la loi précédente des erreurs de chaque observation donne donc 
constamment les mémes résultats que cette regle, et on a vu 
qu’elle est la seule loi qui jouisse de cette propriété [p. 344] 


and 


la loi précédente des erreurs de chaque observation conduit donc 
aux mémes résultats que cette méthode [i.e. la méthode des 
moindres carrés des erreurs des observations]. [p. 345] 


Writing further of the method of least squares of errors, Laplace says 
that it 


devient nécessaire lorsqu’il s’agit de prendre un milieu entre 
plusiers résultats donnés, chacun, par |’ensemble d’un grand 
nombre d’observations de divers genres. [p. 345] 


A detailed discussion is given by Todhunter {1865, art. 1015], an argument 
similar to that discussed earlier in the present article being employed!*%. 
Some history of the method of least squares is given in Article 24. 

The fifth chapter is entitled “Application du calcul des probabilités a la 
recherche des phénomenes et de leurs causes”. It contains nothing pertinent 
to our study, and we accordingly pass on immediately to Chapter VI, “De 
la probabilité des causes et des événements futurs, tirée des événements 
observés”. Here we find the alternative method of inverting Bernoulli’s 
Theorem suggested by Laplace in his third chapter. 

Since, Laplace argues, the probability of most simple events is unknown, 
in considering this probability a priori, “elle nous parait susceptible de 
toutes les valeurs comprises entre zéro et Punité” [p. 370]. Calling the law 
followed by the true possibility of the simple event zx, Laplace notes that 
the theory discussed in preceding chapters yields the probability of the 
observed result as a function y of x. By the third principle of his first 
article it follows that the probability of x (say p,) is equal to 


une fraction dont le numérateur est y, et dont le dénominateur 
est la somme de toutes les valeurs de y. [p. 370] 


Laplace then multiplies both numerator and denominator of this fraction 


by dz, to get 
1 
pede = yds / | ydx . 
0 


6! 1 
Pro <2 <= / yae | | ydz . 
6 0 


Hence 
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Further, let a denote that value of x that maximizes y (i.e. a is the most 
probable value of z). 

Laplace next notes that if the values of z are not equally possible (con- 
sidered independently of the observed result), then on our denoting by z 
‘la fonction de x qui exprime leur probabilité” [p. 371], it follows from 


Chapter I that 
g! 1 
pre<e<o|= / yede | | yz dz. 
8 0 


Since this amounts to considering all values of « as equipossible, with the 
observed result as being formed from two independent results with prob- 
abilities y and z, Laplace proposes, in what follows, always to adopt the 
equipossible hypothesis. 

The results of Book I of the Théorie analytique des probabilités on the 
evaluation of definite integrals by approximations, are to be used here to 
determine the law of probability of the values of x as they deviate from 
the most probable value a. It is perhaps worth noting that, since Laplace 
is usually concerned with data drawn from a large number of observations, 
most of the integrands occurring in this chapter are of the form exp(—kt?). 

Laplace shows here [pp. 371-374] that 


lI 
ia) 
| 
os 
to 
2 
~~, 


Prla—tVJa/k<2<a+tVJa/k] 


where q@ is an extremely small fraction, k = (ey ox yt/2 and Y = 
y(«)|z=a - Notice the following observation: 


il résulte de cette expression que la valeur de z la plus proba- 
ble est a, ou celle qui rend |’événement observé le plus proba- 
ble, et qu’en multipliant a l’infini les événements simples dont 
l’événement observé se compose, on peut a la fois resserrer les 
limites att./a/k, et augmenter la probabilité que la valeur de 
x tombera entre ces limites; en sorte qu’a |’infini, cet intervalle 
devient nul, et la probabilité se confond avec la certitude. 

[p. 374] 


Attention is next focused on a double Bayes’s integral: supposing that 
the observed event depends on simple events of two different types, of 
possibilities x and x’ respectively, we find that 


05 p0e 1 pl 
Pr [As <2 < 02,0, <2! <0) = | / ydede' / [ / y dx dz’ 
a J 6y o Jo 


(notation altered), where y denotes the probability of the observed com- 
pound event. Once again Laplace passes almost immediately from this ex- 
pression to one with integrand exp (—2? — u?). 
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Then follows a brief comment to the effect that, in the drawing of a large 
number n of balls from an urn containing balls of many colours, p of which 
draws result in balls of the first colour, q of the second, r of the third etc., 
the probabilities 2, x’, 2”,... that render the observed event most probable 
are the observed sample frequencies. 


Ainsi les valeurs les plus probables sont proportionnelles aux 
nombres des arrivées des couleurs, et lorsque le nombre n est 
un grand nombre, les probabilités respectives des couleurs sont 
a tres peu pres égales aux nombres de fois qu’elles sont arrivées 
divisés par le nombre des tirages. [p. 376] 


Some examples involving Laplace’s method of approximation now follow: 
in the first of these two players A and B play a match subject to the 
condition that the first to win two out of three games wins the match. If 
in a large number n of matches A has won 2, then the probability that z, 
the probability that A wins a game, lies between a —r/,/n anda+r/,/n 
(with a determined as before) turns out to be 


nL | exp(—18r? /(3 — 2a)(1 + 2a)) dr 

m(3 — 2a)(1 + 2a) Jo 

(compare the method used in Chapter III). Various ramifications of this 
situation follow: we shall not explore them here. 

The second example is concerned with births — the sort of question 
to which Laplace finds his preceding analysis chiefly applicable. Laplace 
proposes to find the probability that the possibility x of the birth of a boy 
in Paris exceeds 5, based on data!** for the 40-year period 1745-1784. If p 
and gq denote the numbers of male and female births respectively, then 


1/2 i 
Pro<s< i= | ydz | | y dx 
0 0 


where y = z?(1 —z)?. The righthand side of this expression being approx- 
imately 


(p+ q)Ptat3/? | pta (p+)? —13pq 


(p — q) VR 2P+It3/2pPt3gits (p—qg? = 12pq(pt+q) 


we find that for the observed values p = 393, 386 and ¢g = 377, 555, 
1 
Pri a < 1/2) = ae — 0.0030761) , 
where p, a complicated function of p and q, has Inv = 72.2511780. Thus 


the probability that, in Paris, the possibility of the birth of a boy exceeds 
that of a girl is very close to 1, and hence 
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Von voit que l’on doit regarder cette probabilité comme étant 
égale, au moins, a celle dans faits historiques les plus avérés. 


[p. 387] 


In his Metretike of 1887 Edgeworth suggests that this problem could 
also be approached from another point of view. First, the two alternatives 
“Chance” and “Law” should be distinguished. Then the probability of the 
birth of a boy should be taken as 1/2, under the assumption that sex is not 
causally connected with the proportion of births. The Law of Error then 
dictates that the conditional probability of the observed event given that 
Chance alone is operative, should be very small; and hence the a posteriori 
probability that Chance alone is operative is very small. Yet under the 
assumption that the antecedent probabilities are equal, the cause must be 
that which favours male births. 


Of course, where the object is to prove, not only that the real 
possibility is greater than 1/2, but also how much greater it is 
there, the method which Laplace appears to prefer is specially 
required. But, where the fact rather than the degree of excess is 
the object of inquiry, there the ground of choice must be found, 
if at all, in the data rather than the quaesitum of the problem. 
(Edgeworth, 1887] 


The third problem, discussed in Section 29, is also concerned with births: 
more precisely, having noticed that the ratios of male to female births are 
19 : 18 in London and 25 : 24 in Paris!?°, Laplace proposes to determine 
the probability of the constant cause to which he attributes the difference 
in the ratios. Denoting by p and q the numbers of baptisms of boys and 
girls respectively in Paris, and by p’ and q’ the similar numbers in London, 
he shows that the probability that the possibility of the baptism of a boy 
is greater in London than in Paris is approximately given (for large values 


of p, q, p’ and q’) by 
k [ ~k?(t—h)? 
— € dt , 
Vm Jo 


(p+ q)3(p' +. 4')? /[2p'q'(p + 4)? + 2pa(p! + 4')9] 


where 


o™ 
N 
{| 


> 
| 


(p'g — pq’) /(p+a)(p' +7’) . 
Substitution of the values 


p = 393,386  q = 377,555 
p! = 737,629  q' = 698, 958 


shows that it 1s 328,268 to one that the possibility of the baptism of a boy is 
greater in London than in Paris. (Laplace notes further that the baptism of 


260 7 Laplace 


foundlings in Paris turns out to have a sensible effect on the ratio observed 
in that city.) 

Czuber later gave the same formula as Laplace, but in a more general 
setting, phrasing the problem as follows: 


Zwei Massen von Individuen seien auf dasselbe Ereignis F hin 
beobachtet worden; die Ergebnisse dieser Beobachtungen seien 
durch die Zahlen s, m, n in dem einen und durch s’, m’, n’ 
in dem andern Falle dargestellt. Daraus sind die empirischen 


Werte 
eee ol = 
s s! 
der Wahrscheinlichkeiten p, p’ abgeleitet worden; dieselben 
mogen die positive Differenz /‘'—/ = 6 ergeben. Wie grof ist auf 


Grund dieser Wahrnehmung die Wahrscheinlichkeit, da8 p’ > p 
sei? [1921, §237] 


This theory is then applied to Laplace’s example!*®, and this application 


in turn is followed by one concerning legitimate births in Austria from 
1878 to 1894. Of these, s = m+n = 12,695,948 children were born alive, 
m = 6,533,961 being boys. Similarly, s’ = m’ + n’ = 332,306 were born 
dead, of whom m’ = 191,159 were boys. On applying his earlier result 
Czuber deduces 


daB man fast mit Gewifheit aussagen kann, der Knabenge- 
burt liege bei Totgeborenen eine grofere Wahrscheinlichkeit zu- 
grunde als bei den Lebendgeborenen. [1921, §237] 


The examples discussed in Sections 30 and 31 refer to the probabilities of 
future events, and thus, as Todhunter [art. 1029] has noted, they are more 
logically placed after Section 32. We shall accordingly postpone discussion 
of them for the moment, turning rather to the problems of Section 32, 
where we find discussed*?? 


la probabilité des événements futurs, tirée des événements ob- 
servés, et supposons qu’ayant observé un événement composé 
d’un nombre quelconque d’événements simples, on cherche la 
probabilité d’un résultat futur, composé d’événements sembla- 
bles. [p. 401] 


Denoting by x the probability of each simple event, by y the probability 
of the observed result, and by z that of the future result, Laplace states 
that P, “la probabilité entiére de l’événement futur”, is given by 


1 1 
| yeas | f ydx . 
0 0 


No proof of this result is presented: we provide the following heuristic ar- 
gument as corroboration. 
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Let Ym = 9(hi,, Hi,,...,Hi,,) be the observed result depending on 
the simple events F;,, Fi,,...,£;,,, and Wn = ¥ (Ei, Ki,,... , Fi,,) be the 


future result. Let H; = [x;-1,2;), where 7 € {1,2,...,N—1}, Hn = 
[en-1,2Nn], andQ =a) < 241 <---< @2y =1: the Hj; are mutually exclu- 


N 
sive and exhaustive, with U H;, = [0,1]. Finally, let Pr[A;] = 2;-2;-; = Av 


(i.e. an assumption of ert and let ym (€) and 2 \(€) be given, 
for any € € H,;, by 


y)(€) = Pr [pm | Hi), 2 (€) = Pr[wn | Ai} . 


Then 
Pr{~m | Hi] Pr [Hi] 


a 


Pr[~Ym] = 


es, 
1] 
— 


yio(é) Ae , 


| 


a1 


and by the usual sort of limiting argument (refining the partition), we 
obtain 


1 
Pr[ym)= | tm(a) dx 
0 
Similarly, assuming Pr[~m@n | Hi] = Pr[ym | Hi] Pr[vn | Hi], we have 


Pr [vn Pom] =f vma(2)en(2) dz 


Thus , : 
Prin ym] =f um(a)in(z)de ff ym(a)de. (61) 


As a first example illustrating the use of this result, Laplace supposes 
that an event has happened m times running. If the probability of the 
simple event is 2, the probability that the event will occur the next n times 


is 
1 1 
je iz/ | 2™ 
0 0 


(m+ 1)/(m+n-+1), 


a result that we recognize as the rule of succession. It follows (though 
Laplace does not give this extension here) that if 


{| 


P 


y(z)=a™(1—2)", 2(e)=2?(1—2)%, 
then 
_— (m+ pt dr(nt+¢4+ (m+n 4 2) 


— T(m+n+p4+q4 2) (mt I (n+ I) | ee) 
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Laplace next supposes that the observed event is composed of a vast 
number of simple events. If the future event is composed of relatively few 
simple events, then P = Z (approximately), where Z is the value of z(z) 
for x = a, the value of x that maximizes y(z). (This can be seen in a special 
case by applying the Stirling-de Moivre approximation to the factorials in 
(58) above: Laplace’s argument is more general.) If, next, the future result 
is a function of the observed result, so that z = y(y), say, then it turns out 


that 
P=o(¥) /VI+¥e¥)/e7) 


where Y = y(a). In particular, if p(y) = y”, then 
P=y"//lfn. 


On the other hand, the probability of the future result, given zr = a, is Y”. 
Thus, notes Laplace, 


on voit ainsi que les petites erreurs qui résultent de cette sup- 
position [i.e. that 2 = a] s’accumulent a raison des événements 
simples qui entrent dans le résultat futur, et deviennent trés 
sensibles lorsque ces événements sont en grand nombre. [p. 404] 


In Section 30 Laplace turns his attention to the probabilities of results 
based on tables of mortality or assurance, constructed from a large number 
of observations. He supposes firstly that 


sur un nombre p d’individus d’un age donné A, on ait observé 
qu’il en existe encore le nombre q 4 l’A4ge A +a; on demande la 
probabilité que, sur p’ individus de l’age A, il en existera g’ + z 
a Page A-+a, la raison de p’ et q’ étant la méme que celle de p 
aq. [p. 392] 


The solution to this problem is clearly given by formula (57), with 


Ym — of p people aged A, g survive to age A +a; 
Yn — of p' people aged A, q’ +z survive to age A + q; 
Ym —  Prlq people survive to age A+ a |p survive to age A] 


= (P) 291 — z)Prt; 


zn —  Pr{q’ +z people survive to age A+a |p’ survive to age A] 


pl t t t 
= (fhe apt 
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where x denotes the probability that an individual of age A survives to age 
A-+a. Thus, by (57), 


P= Prlvn | Ym] 


/ 1 / / i - 
a P pita +2 (1 — g)pte’—I-a'-2 dy r1(1—a2)P-%de . 
q' +z) Jo 0 


Laplace then considers approximations to this value. 

He next supposes that, of p individuals aged A, q live to age A+a and 
r to age A+a+a’: what is the probability that, of p’ individuals of age 
A, (qp'/p) + z and (rp'/p) + z’ will survive to ages A+a and A+a-+a’ 
respectively? This probability is found in the same manner as that just 
discussed, and, extending the procedure somewhat, Laplace shows that 


l’expression précédente de P est donc la probabilité que les er- 
reurs de q’,r’,s’,... sont comprises dans les limites zéro et z, 
zéro et z’, zéro et z”, etc. [p. 397] 


In Section 31 Laplace applies his analysis to the question of the error 
incurred in the determination of the population of a large empire, based on 
the numbers of births!?8. He reduces this to an urn problem, supposing that 
from an urn containing an infinite number of white and black balls, p draws 
are made and q of these yield white balls. A second series of draws is then 
made, q’ of these resulting in white balls. Assuming that the unknown ratio 
of white to black balls in the urn initially is x : 1, we require the probability 
that the number of balls drawn in the second series lies within the limits 
(pq’/q) + z. Proceeding exactly as in the preceding section, Laplace shows 
that this probability is approximately given by 


Pe ar a 


y) CO 
= a= | ; 
V2ro2 Jz 


where 0” = pq'(p — a)(q +q')/q°. 

A more careful look at this problem may prove instructive!?’. The pro- 
cedure adopted was the following: thirty departments were chosen from the 
whole of France in such a way as to compensate for climatic vagaries. In 
each of these a number of parishes (or townships) were chosen 


dont les maires, par leur zele et leur intelligence, pouvaient 
fournir les renseignements les plus précis. [p. 399] 


There, on the 22nd of September 1802 (the last day of An X in the Repub- 
lican Calendar!*°), censuses were taken, a total of 2,037,615 people being 
enumerated. In addition, the following summary of births, marriages and 
deaths from 22nd September 1799 to 22nd September 1802 was obtained: 
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Births Marriages Deaths 
110,312 boys 103,659 men 
46,037 
105,287 girls 99,443 women 


the ratio of male to female births being 22 : 21. Further, the ratio of the 
population to annual births was 28.352845 (to 1, presumably). 

In the first edition of the Théorie analytique des probabilités Laplace 
supposes at this point that the annual number of births in France is one and 
a half million, a figure that is changed?! to one million in the third edition 
of 1820. The total population is then estimated at 42,529,267 (28,352,845 
in 1820). 

Laplace next examines the error that might be made in proposing such 
an estimate. As we have already noted, this examination is effected by 
considering the population model as an urn model, and using the expression 
for P given above. However, as Westergaard [1968, p. 82] has observed, 
Laplace’s result is weakened by his taking q’ as 1.5 million in both the 1812 
and the 1820 editions, and his consequent obtaining of the result that 


Il y a donc environ 1161 a parier contre un qu’en fixant a 
42,529,267 la population correspondante a quinze cent mille 
naissances, on ne se trompera pas d’un demi-million. [p. 401] 


(It is interesting to note that a correction was made in the introductory 
Essai philosophique sur les probabthtés; the figure 1,161 of the first edition 
became 3,000,000 in the fifth.) 

It is here that Westergaard believes that a number of doubts should 


arise, viz.192 


1. Was it in fact possible for Laplace to know that the normal number 
of births in the newly-defined France was one million or that it had 
been one and a half million previously? 


2. The actual numbers of births, etc. in the various parishes should be 
presented, so that it can be ascertained whether the birth-rates have 
been grouped following the binomial distribution or whether they are 
grouped around various centres in the country. 


3. Should the number of inhabitants not have been taken in the middle 
of the period 1799 to 1802? 


Despite these objections Westergaard concludes that Laplace’s solution was 
indeed important, though perhaps its possibilities were not fully appreci- 
ated by all. 

As before, Czuber also considers this same question, but again in a per- 
haps more general setting. He frames the problem as follows: 


Von s beobachteten Einzelfallen haben m Falle den Verlauf £, 
n = s—m den entgegengesetzten Verlauf genommen. In einer 
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zweiten Beobachtungsreihe sei blo die Zahl m’ der Wiederhol- 
ungen von £ erhoben worden. Es ist die Wahrscheinlichkeit zu 
bestimmen, dafi die zugehorige Zahl n’ innerhalb bestimmter 
Grenzen liege. [1921, §238] 


The solution obtained is essentially that given by Laplace, and Czuber ap- 
plies his result to the determination of the number of live female births 
registered in Austria in 1877 to 1894, given (a) the number of live male 
births registered in that period, and (b) the numbers both of male and of 
female live births registered from 1866 to 1877. 

In Section 33 Laplace continues his study of births. It had been noted 
that in Paris, over a number of years, the number of registrations of bap- 
tisms of boys exceeded that of girls. Laplace proposes here to determine 
the probability that this superiority will be maintained for a given period 
(for example, a century). Denoting by 2n the number of annual baptisms, 
of which p are of boys and q of girls, and by x the probability that an infant 
about to be born and baptised will be a boy, one finds that the probability 
that in each year the number of baptisms of boys will exceed that of girls 
is the sum z of the first n terms of the series 


2n(2n = 1) pen-2 
1.2 


Then z’ is the probability that this superiority will be maintained for i 
consecutive years, and the probability P, given the pertinent data, that 
this superiority is maintained for 7 years is, by the formula of Section 32, 


1 1 
ay, (1 ayte' de / | z’(l—az)idz. 
0 0 


Using the data obtained from the years 1745-1784, during which p = 
393,386 and q = 377,555, Laplace finds by appropriate approximation 
that P = 0.782. 

We now turn to Chapter VII, entitled “De l’influence des inégalités incon- 
nues qui peuvent exister entre des chances que l’on suppose parfaitement 
égales”. The chief topic of concern is the question of the tossing of a coin 
known to be biased (though whether towards heads or tails is uncertain), 
a topic that Laplace had considered earlier'?. 

In the final paragraphs of this chapter we find integrals reminiscent of 
those of the preceding chapter. These arise in the following way: let P de- 
note the probability of a compound event composed of two simple events 
of probabilities p and 1 — p. Suppose further that p is susceptible of an 
unknown error z that can take on values in [—a,+a], and let y be the 
probability of p+ z. Then, says Laplace, one will have “pour la vraie prob- 
abilité de |’événement compose” [p. 415] 


/ Prpdz / [ ydz, (59) 


27? 4. Ine? —1(1 — 2) + (1-2)? +--- 
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where P‘ is what P becomes on substitution of p+ z for p. 

The derivation of this result, as expounded by Laplace (and as just indi- 
cated), is by no means clear to me. It seems, though, that it can be deduced 
in. the same fashion as the expression (57), with ym and z, replaced by ¢ 
and P’ respectively. 

Laplace next goes on to say that, if z is determined only by an observed 
event (formed of the same simple events) of probability @, then the prob- 
ability of the compound event will be 


l—p . l—p 
/ P'Q dz / Q dz. (60) 


P p 
He then concludes that “ce qui est conforme a ce que nous avons trouvé 
dans le Chapitre précédent” [p. 415]. While the ratios given here are cer- 
tainly of the same form as (57), it must be remembered that this latter 
expression refers to future events, which is not the case in (59) and (60) 
above; though if we let Q and P’ in (59) and (60) correspond respectively to 
the probabilities of y», (the observed result) and of wp, (the future result) 
in the discussion leading to (57), we see that (59) and (60) do in fact agree 
with (57) (some extra investigation of the limits is perhaps called for). 

In Section 35, the first section of Chapter VIIT, “Des durées moyennes 
de la vie, des mariages et des associations quelconques”, Laplace discusses 
the mean duration of life of n infants, where n is very large; and he finds 
the probability that the sum of the ages attained by n infants hes within 
given limits. In Section 36 Laplace continues his study of mortality, con- 
cerning himself now with the mean duration of life when one of the causes 
of mortality dies out. 

In Section 37 we find a discussion that more nearly concerns us — a 
discussion of the mean duration of marriages'**. Laplace’s statement of 
the problem is as follows: suppose that a large number n of marriages are 
entered into between lads of age a and lasses of age a’. Let us determine 
how many marriages are still going strong after x years!*°. 

Todhunter [1865, art. 1036] finds Laplace’s investigation of this problem 
“very obscure”: nevertheless we shall try to present the latter’s solution in 
as lucid a manner as possible, before discussing the alternatives presented 
by Todhunter. If y and w denote respectively the probabilities that a boy 
and a girl who married at ages a and a’ will reach ages a+ and a’ +2, 
then the probability!*® that their marriage will last to the x-th year is yy. 
Thus!3? 


Pr (i out of n marriages will last x years] = H(yp)'(1 — pp)” . 

The next problem is the estimation of the product yy. To this end 
Laplace refers to his §16 [p. 281], where he showed that, in such a binomial 
situation, the greatest term in the expansion is that in which the value of 
i is given by 


i=[(n+ l)py], 
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where [x] here denotes the integral part of z. By the same article, Laplace 
says, it is extremely probable that the number of marriages that last differs 
only very slightly from that number — 1.e. that this is the most probable 
number. Thus (again by §16) it follows that 7 = nyw approximately. 
From mortality tables, suppose we can find p’ (the number of men living 
at age a) and q’ (the number surviving to age a+ x): then, approximately, 


ny = nq'/p' . 
Similarly ~ = q”/p", and thus!*8 


1 = ng'q’ /p'p” 
This, then, is the “best” estimate of 2; and having found it, Laplace goes 
on to consider the problem of finding 


la probabilité que |’erreur de la valeur précédente de 2 sera com- 
prise dans des limites données. [p. 424] 


J am forced to agree with Todhunter that this investigation is “very ob- 
scure”, and fearing that too slavish an exposition of the original might but 
render obscurum per obscurius, | choose rather to present the argument as 
I see it. 

Let us suppose, as Laplace initially did, for ease of calculation that 
a=a',q’=q',p" =p’, p=. Then the value 7 of J (a random variable) 
found above becomes 

i=n(q'/p') 


Now if of a large number p of individuals of age a, q are alive at agea+a, 
then, by Article 30, the probability that of p’ other individuals of age a, Z 
will reach age a + z 1s such that 


fz(z) dz 


l| 


Pr {(p'q/p) + 2 < Z < (p'a/p) +24 az] 


l| 


pre~@dz / 2m qp'(p — q)(p+ p’) 


where Q = p%z? /2qp'(p — q)(p + p’) . If one supposes p and q very large!39, 
then y = q/p, and hence 


fz(z)dz=e-® dz | /2n p’p(1 — ¢) 
where ® = z?/2p'y(1 — y) (since 1+ (p'/p) = 1). 
Suppose next that, conditional on the value of Z, I ~ b(n, 2), Then, 
by Article 16, 


fz |z)dl = Pr[ng?+l<I<ny?+l+dl|Z=z2] 


ent dt | /2nn y?(1 — vy?) 


I 
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where L = 1?/2n y?(1 — y?). Recalling now that y = q/p and setting q’ = 
(p'q/p)+z, we find that » = (q'—z)/p’. Thus, on neglecting terms involving 
z*, we have ny? = n (q'/p')? — 2nq'z/p'*. If we now put s = 1 — 2nq'z/p'? 
then 


frz(s | z) ds 
= Pr in(a'/p')? Qng’z/p?+l< I < n(q'/p')°— 2nq'z/p'? +14 dl|Z = z| 
= Pr|n(q'/p')’ +s <I<n(q'/p')’+s+ds | Z=2| 
=e Sds //Inm oI 9), 
where S = (s+ 2nq'z/p'?)* /2np?(1 — y?). It then follows that 
Pr In (a'/p')? +s<I< n(q'/p')? +s-+ds, 
(p'q/p) +2 < Z < (p'g/p) + z + dz] 
= fa(s|2)fal2)dsde, 


and hence 


Pr [so a Jos n(q'/p')” < s1| = - a frz(s | z)fz(z) dz ds 


= [evar ware) ff Ot deds. (61) 


On setting 
k? = p' /{2ny?(1 — y) [p’ + (p! + 4n) y]} 


we obtain 


$1 


k 2,2 
Pr 50 <I—n(q'/p') < 51 = ve gorge. (62) 


and hence 
Ze. fe? 2.2 
Pr IP-n q'/p' | < so| == / ere ass, 
ue? Vide 
(The reduction of the double integral in (61) to (62) is achieved by setting 
p=q'/p'.) 
L’analyse précédente s’applique également a la durée moyenne 


d’un grand nombre d’associations formées de trois individus ou 
de quatre individus, etc. [p. 426] 
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Todhunter presents three alternative solutions to the problem. In the first 
of these he supposes that q//p’, a ratio of observed frequencies obtained 
from mortality tables, is the probability of a specified individual’s being 
alive at age a+ 2x2. Then the probability of a specified pair being alive is 
T= (q'/p')’, and thus the probability that of the n original marriages 7 are 


still unbroken is 

(") me Olena 9 daa 
The replacement of the probability that a specified individual is alive at age 
a+ <x by the observed ratio q’ : p’ is seen by Todhunter as an assumption 
“analogous to what we have called an inverse use of James Bernoulli’s 
theorem” [art. 1036]. 


For his second alternative Todhunter relies on “the usual principles of 
inverse probability as given by Bayes and Laplace” [art. 1036]. To this end 


he uses the formula given earlier by Laplace, viz. P = te yz dx iis ydz, 


with 
i iy Ger. 


Then the desired probability P is given by 


1 


P= (") / wal — z)P (2?)' (1 - 2?) dx /I x? (1— x) dx. (63) 


0 0 


An exact evaluation of this ratio, in terms of Eulerian integrals of the first 
kind!*°, yields 


P=S\(" BG! +2+htlptn—itl) | 
k=0 


Tables of the gamma-function allow the finding of a numerical value of P 
for given values of the variables. Todhunter, however, suggests a different 


method of evaluating (63) above; he replaces (°) (x?)’ (1 - 2)" by 


[2rnx? (1 — x*)| we exp(—r?/2n x” (1 — 2”)) 


where r is not large, and shows eventually that the probability that the 
number of surviving marriages lies in the interval 


[ra — 7 \/2na? (1 — a?) , na® + 7 \/2na? (1 — a) 
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where a = q'/p’, is approximately given by 


= | e~? di + [2rna* (1 — a?)}7*/? exp (—r?) 
0 


Todhunter’s final solution requires that one know, from observation, that 
of m, marriages at age a, n; last to age a+ 2. The probability, then, that 
of n marriages 2 survive for the same period is, as it was in the preceding 
solution, 


n\ | . 1 
P= ( ‘) 1 geet ap Pade /| a2 Tse s da 
eae 0 


a ratio that may be evaluated as before. 

In concluding our discussion of this section of the Théorie analytique des 
probabilités, we note that Laplace’s solution requires the estimation of » 
as q'/p’ (as in Todhunter’s first solution), and then the use of the formula 
used by Todhunter in his second solution. 

The problem considered in Chapter IX, “Des bénéfices dépendants de la 
probabilité des événements futurs”, is stated at the outset of §38 as follows: 


concevons que l’arrivée d’un événement procure le bénéfice v, 
et que sa non-arrivée cause la perte yw. Une personne A at- 
tend l’arrivée d’un nombre s d’événements semblables, tous 
également probables, mais indépendants les uns des autres; on 
demande quel est son avantage. [p. 428] 


Laplace shows firstly that A’s advantage is zero if vq = u(1 — q), where 
q is the probability of the occurrence of each event, and then goes on to 
show that, if s is large, the probability that A’s real benefit lies within the 


limits s{qgv — (1 —q)p] trV/s(ut v) is 


—— | e719 dr + tg r/9 ; 
mq(1 — q) Jo 2srq(1 — q) 


where @ = 2q(1 — q). This analysis is then extended to the case in which 
the (initial) probabilities of the s events, as well as the attendant gains and 
losses, are different!*!. 

Further modification is undertaken, in §39, in supposing that, at each 
trial (“événement”), A has any number whatever of chances to hope or 
fear (this is illustrated by an example about the drawing of balls from an 
urn); and Laplace then passes on to the case in which the probabilities of 
the events are unknown!*?. He supposes that of m similar expected events, 
n have occurred, and that A expects s similar events, each of which will 
procure him a gain v if it occurs, and a loss p if it does not occur. If we 
represent by (n/m)s +z the number of the s events that will occur, the 
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probability that z lies within [—kt, +t] is, by his §30, 


2 ef ea 
= c€ 
a. 


where k* = 2ns(m—n)(m-+s)/m*. (Todhunter’s solution is again slightly 
different — see his Article 1040.) This latter integral is in fact also the 
probability that the real benefit to A lies within the limits 


E eee aaa st kt(v +p) . 


m 


(The rest of this chapter is devoted to questions concerning life annuities, 
and therefore need not concern us here.) 

At the beginning of Chapter X, “De l’espérance morale”, Laplace re- 
calls the difference, already indicated in his §2, between mathematical and 
moral expectation (“l’espérance”)+*%. He reminds us that, in that article, 
he had cited a principle [viz. x being the physical fortune of an individ- 
ual, the increase in his moral fortune is k dz/z], the principal useful results 
flowing from which he proposes to examine here. As a consequence of his 
preliminary investigations he finds the following: 


(i) the game that is mathematically the fairest is always disadvantageous; 


(11) it is better to expose one’s fortune in lots to independent risks than 
to expose it all to the same risk. 


At the start of §42 Laplace states that the principle he uses to calculate 
moral expectation was proposed by Daniel Bernoulli, who had stated it in 
connexion with the St Petersburg paradox (a problem that Laplace now 
examines)!44. Of this principle Laplace writes 


ainsi la supposition la plus naturelle que |’on puisse faire est 
celle d’un avantage moral réciproque au bien de la personne 
intéressée. [p. 449] 


The eleventh chapter!*° of this work is entitled “De la probabilité des 
témoignages”. Considering firstly the case of a single witness, who asserts 
that the number 7 was drawn from an urn containing n numbers, Laplace 
notes that any one of four hypotheses may be entertained, viz. 


Ou le témoin ne trompe point et ne se trompe point; ou il ne 
trompe point et se trompe; ou il trompe et ne se trompe point; 
enfin, ou il trompe et se trompe a la fois. [p. 455] 


Let us denote by p the probability of the veracity of the witness, and by r 
the probability that he is not mistaken, and let the hypotheses given above 
be denoted respectively by H,, He, H3, H4 with R the announcement of the 
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number 2. Under the assumptions of suitable uniformity and independence 
of {H;} and R, Laplace deduces that 


pr/n+ (1—p)(1—r)/n(n— 1) 
pr/n+p(l—r)/n+(1—pyr/n+ (1 —p) n/n 


It should be noted, though, that this is referred to as “la probabilité de la 
sortie du n° 2” [p. 457]: this is yet another example of the difficulty one 
experiences in reading Laplace!*®. 

In similar vein Laplace considers the assertion that a white ball is drawn 
from an urn known to contain one white and (n — 1) black balls, while in 
the following section he considers the case of the drawing of a ball from one 
urn, its being placed in a second, and the subsequent drawing of a ball from 
this urn. Each stage of this procedure is attested to by different witnesses, 
and a probability similar to that given above is deduced. 

In §47 Laplace considers the case of simultaneous testimony, deducing 
in general that, when prior probability of 1/2 is assigned to both the truth 
and the falsehood of the report, the posterior probability that the report 


is true is 

Pile" +(1-p)'], 
where r denotes the number of witnesses and p is the probability that each 
tells the truth. 

Attention is also given to the case in which two witnesses assert that 
different numbers are drawn, while the case of r witnesses is addressed via 
a finite difference equation. Subsequent examples, in similar vein, and the 
three Additions to this chapter, contribute nothing else to our discussion. 
We pass on therefore to the Supplements. 

The first Supplement, entitled “Sur application du calcul des proba- 
bilités a la philosophie naturelle”, and dated 15 November 1816, is to a 
large extent made up of the contents of two earlier memoirs!4’, as we have 
already mentioned (see §7.14). The material of the first of these memoirs, 
one that bears the same title as this Supplement, 1s in the main repeated, 
only some general comments and an application of some probability for- 
mulae to the length of a seconds’ pendulum being omitted: the contents of 
the other memoir, “Sur le calcul des probabilités appliqué a la philosophie 
naturelle”, are repeated in their entirety. These memoirs, and the appro- 
priate sections of the first Supplement, deal with errors of observation, and 
as such do not concern us. 

The second-last section, “De la probabilité des jugements”, is, however, 
more directly pertinent!*®. The ideas had already been broached in the thir- 
teenth chapter of the Essai philosophique sur les probabilités, but far more 
detail can be found here. After several general remarks that, although in- 
teresting in themselves, are not germane to our present discussion, Laplace 
considers (in §1, p. 526) the following question: suppose that the proba- 
bility of an offence is such that the citizens have more to fear from the 


Pr[H, V H2|R] = 
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infringements that might arise from its impunity than the errors of the tri- 
bunals — in which case the interest of society necessitates the sentence of 
the accused. Let a denote this degree of probability, and suppose that the 
judge who sentences an accused declares thereby that the probability of his 
offence is at least a. Let X (> 5) be the probability of this opinion of the 
judge, varying by infinitely small degrees equal to x and equally probable a 
priori. Suppose too that the tribunal is composed of p+ q judges, of whom 
p convict and q acquit the accused. 

For a given value x of X, the probability that the opinion of the tribunal 
is equitable!#? will be proportional to z?(1—2)!, while the probability that 
it is inequitable will be proportional to (1 — z)?z?. Thus, by Article 1, the 
probability of the goodness of the judgment (an event that we shall denote 
by G) will be 


Pr[G|X =a; p,q)=2%(1—2)! /[e(1—2)'+(1—2)?2%]. (64) 


At this stage we find an argument in inverse probability appearing: the 
probability of X, given that p judges convict and q acquit the accused, is 
then!®° 


Prin << X <x+dz|p,q|= fx(x) dz 


= |[#?(1-—2)'4+ (1-2)? 2"\ dz ae [e?(1—2)'+(1—2)? 24] dz 


= [#?(1—2)'+ (1-2)? 2%] dz /| z?(1—2x)'dz. (65) 


Thus the probability of the goodness of the judgment relative to x being 
the product of (64) and (65), the dz being introduced in Laplace’s usual 
way, we find finally that the probability of the goodness of the judgment 
relative to all values of z is 


Pr(G] 


\| 


1 
| Pr(G |X = 2; p) | fx(2) dz 
1/2 


[0 -2)tde | f src a)tde. 


The probability of the error to be avoided on the goodness of the judgment 


is then 
1/2 1 
/ (1—a)tde / f z’(l1—a)idz 
0 0 


Ly jo(p + l,q+ 1) ? 


1 — Pr[G] 


II 
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the numerator being an incomplete beta-function*®*. This reduces to 


ptq+1  (pt+q+1)(p+q) 
opt+q+1 1+ ‘ie 1.2 


(p+q+1)(p+4q)...(p+ 2) 
= 1.2...g | 


which becomes 2~‘?+!) in the case of unanimity (i.e. g = 0). 
Then follows a section (§2) in which Laplace gives approximations for 
large p and q in two cases: 


(i) when p—gq is large, in which case the probability of the error, as given 
by Article 28, is 


Cd eae 1 _P+a _ (ta)? - 189] 
Qp+9+3/2pP+ 3 gIt3(p — q)/a (p—q)? 12pq(p + @) 

(11) when p—q is small relative to p, in which case the probability of the 
error, by Article 19, is 


) 


1 [e.@) 
Jr t2 
where t” = (p — q)?(p + q)/8pq. 


Each of these cases is illustrated by a numerical example: Pearson [1978, 
pp. 692-693] presents some exact calculations for various small values of p 
and q. 

There are a number of comments that one might make in connexion with 
this section’>?. Firstly, the value of z is assumed the same for each judge, 
this value being always taken to be at least 5. Further, a factor (? = is 
missing (there seems no reason not to suppose the judges interchangeable), 
but this in fact cancels out in the end, and so its omission does not affect 
the final result. Finally, Pearson (loc. cit.) has suggested that the factors 
z?(1 — x)? and (1 — x)?x!? should be multiplied respectively by the proba- 
bilities of the guilt and innocence of the accused. 

The final section of this Supplement is entitled “Sur une disposition du 
Code d’instruction criminelle”: it contains nothing useful. 

The second Supplement, dated February 1818, is entitled “Application 
du calcul des probabilités aux opérations géodésiques”. It is described by 
Todhunter [1865, art. 1050] as “very interesting, and considering the sub- 
ject and the author it cannot be called difficult”. Inspired by the desire 
to extend his application of the probability calculus to natural philosophy, 
Laplace proposes here to consider the question of triangulation. 


2 
eo es 


Cette application consiste a tirer des observations les résultats 
les plus probables et a determiner la probabilité des erreurs dont 
ils sont toujours susceptibles. [p. 531] | 
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Cc” Cc” 


(5) 


N/E ANY 


Cc Cc? Cc 


FIGURE 7.7. Laplace’s sketch for triangulation. 


After some general remarks on matters geodetic and the applicability 
of some of his earlier results, Laplace points out that the formulae to be 
discussed here are applicable to future observations; yet, suitably modified, 
they may also be applied to past data. 

As a first application, Laplace considers a great arc AA’A”... ona 
sphere, around which a chain of triangles ACC’, CC’C”,C’C"C"™,... is 
formed, the sides CC’, C'C", C’C"... intersecting this arc in A, A’, A”,... 
Let A, A), A®@... denote the angles CAA’,C’A'A"”,C”A" AM” ..., and 
let C, CO), C@),... denote the angles ACC’, CC’C”, C'/C"C",, ... (Laplace 
states that “Je ne donne point de figure, parce qu’il est facile de la tracer 
d’aprés ces indications” [p. 535]: nevertheless he supplies the figure in the 
third Supplement.) From these data one has 


A+AD+C-a=r+t 


where a is the error in the observed angle C' and ¢ is the excess of the 
angles of the spherical triangle ACA’ over 7. Setting up a series of such 
equations!* Laplace shows that, angle A being completely known, the error 
in angle A™) is 

yO OW) gy 2) fe ae gy 


(+ for n odd, — for n even). 

Proposing to find the probability that this error lies within given limits, 
Laplace supposes!** firstly that the probability of any error a is propor- 
tional to exp(—ha”). He then derives the probability that the error in A‘) 
lies within the limits tr ./n as 


3h 
24/ sz [ exp(—ahr?/2) dr 
2 JE fexo(-nr’) dr , 


or 


276 7 Laplace 


depending on whether or not the three angles of each triangle are corrected. 

Laplace next turns his attention to the determination!®® of h. Consider- 
ing the different values of h as causes of the observed event, he shows that 
the (posterior) probability of h will be, “par le principe de la probabilité 
des causes tirée des événements observés” [p. 539], 


h”/? exp (—h0?/3) dh / / h”!? exp (—h6?/3) dh , 
0 


where 62 = T2470)" 4....4 Tint)? and the T’s are the excesses in the 
triangles. ‘The value of h that it is necessary to choose (as Laplace has it) 
is the mean, 1.e. 


y hin+2)/2 exn (—h6? /3) dh / / h”!? exp(—h6?/3) dh 
0 0 


or 3(n + 2)/207, which, for large n, is approximately 3n/20?. 


Cette quantité est la valeur de h qui rend |’événement observé 
le plus probable, la probabilité de cet événement, a priori, étant 
proportionelle & h”/? exp (—h6?/3). [p. 540] 


Thus the probability that the error in angle A‘) lies within the limits 
+r ./n becomes 

3vn | 21402 

er exp(—9nr?/46*) dr , 


and the probability that it lies within the limits +26r/3 is 


Sz | exp(-r9) dr . (66) 


Similar expressions are derived elsewhere in this Supplement —— see pp. 543, 
044, 546 and 548. 

More generally, Laplace supposes next that the law of the probability 
of the error a is y(a), rather than the exp(—ha?) considered before. He 
supposes also that the same errors, be they positive or negative, are equally 
probable, and that ~ is defined over [—o0, +00]!°°. The probability that a 
certain error falls within given limits is also given by (66), with different 
limits of integration to those found before. 

The final section of this Supplement, entitled “Sur la probabilité des 
résultats déduits, par des procédés quelconques, d’un grand nombre d’obser- 
vations” , contains nothing pertinent. 

The third Supplement, “Application des formules géodésiques de prob- 
abilité a la méridienne de France”, is chiefly devoted to numerical applica- 
tions of the formulae of the second Supplement. Neither the third nor the 
fourth Supplement (untitled) seems to contain any pertinent remarks. This 
last Supplement, written in 1825, is mainly the work of Laplace’s son: it is 
devoted chiefly to generating functions”. 
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I propose here to consider the relationship between some results of Bayes 
and those given in Laplace’s Mémoire sur la probabilité des causes par les 
événements. 

Firstly, framing our thoughts in the “urn” situation, we see that Bayes 
had the idea of a single urn, whose composition was to be examined. 
Laplace, however, entertained the idea of a population of urns, and hence 
could ask which of them was the “cause” of the sample. A closely related 
point is this: Bayes (at least as far as communicated by Price), considered 
only the “estimation” of a probability, while Laplace’s aim was definitely 
to predict future behaviour!*®, 

Let us now turn to the connexion between the Rules of Bayes’s Essay 
and Laplace’s expression (4). The first rule (p. 399 of the Essay and §3.4 
of the present work) states that if all that is known about an event is that 
it has happened p times and failed ¢ times in p+ q or n trials, the chance 
that one is right in guessing that the probability of its happening in a sin- 
gle trial lies between any two degrees of probability X and zg, is given by 
(n+ 1)(? a) multiplied by the difference between the series 


pat. pag 2 pt+3 — 
and 

gett =% gPt2 % q(q — 1) gets _ 

pt+l pt+2 2 pt+3 


Let us also recall that Price stated that Bayes, noting the impracticability 
of this formula for large values of p and q, had deduced his second Rule 
(see §3.4). We note here, from this Rule, the expression 


n+l V2pq bats mz n—2 mz 
> Dees = x Jn x BaP b! x |mz 3 + a ar 
_ (n—2)(n—4) m’z" | (n—2)(n — 4)(n — 6) m*z9 & 
(Qn)(3n) 7 (2n)(3n)(4n) 9 


where E = (Pt). 
Price’s investigations led him to conclude that 


In all cases when z is small, and also whenever the disparity 
between p and q is not great 2» is almost exactly the true 
chance required. And I have reason to think, that even in all 
other cases, 20 gives the true chance nearer than within the 
limits now determined. [Bayes 1764, art. 28] 
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Jt is my aim now to show the correctness of this thought, by comparing 2D 
with the limit derived by Laplace. To effect this comparison, and correcting 
Bayes’s definition of m*, we note that 


on = ge tat!) vepq era ( p i ; ) 


(p+q) Vota p'iq! \p+a p+4q 
mz n—-2 m2?) n—-2 n—4 mz? 
x |mz— a a gy ee 
2n 5 2n 3n rf 


_o(eptq+))! peg? 1 ,_ me n—2 m°2° 
—— plq! (p+ a)Pte m 


On ow opt +1)! pq! _ mz8 mtz® mS z? | 


_——_ ee (12 
p! q! (p+ q)Pt4 


1)! Pad z 44 
~ getat))! anes | (1- mat hE.) de 
p! q! (p+ q)Pt@ Jo 2 


1)! Pqf A 2.2 
e erie | gems de 
pig! — (p+a)Ptt Jo 


Using the Stirling-de Moivre formula and the approximation 
(p++ lpPtete? = e(p + grt? , 


an equality that comes from Laplace [1774, p. 32], who “deduces” it from 


1 p+qt+3/2 
it 
p+q 


> (p+)? ; Je (atp)*2?/2pq gp 
V2mpq Jo 
which we recall is exactly the approximation derived by Laplace (see (3) 
above). 

Thus Price’s conjecture as to the goodness of 2) as an approximation to 
the desired probability is seen to be well founded (note also the footnote 
on pp. 316-317 of Bayes [1764] ). 

Let us return now to the probability Q(p,q; m,n) defined in §7.3. The 
question that presents itself is, what is the relation between this probability 
and that derived by Bayes? The answer is by no means as simple as it would 
at first blush seem!>?. 


we find that 
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In his discussion of this matter, Pearson writes!®® 


... Bayes’ Theorem as we now usually state it is of the follow- 
ing nature. We suppose past experience to be represented by 
p successes and q failures in n trials, and we ask what is the 
chance of r successes and s failures in further m trials. We hold 
this problem answered by the expression 


ae (r+)! ies zPtr (1 — x)it* dx 
a rls! Is xP(1—2r)¢ dz 


(1978, p. 366] 


What I would stress, by giving this quotation, is that at this time (viz. 
the 1920’s) the idea that Bayes had proved the so-called “discrete Bayes’s 
Theorem” was by no means all-pervading. (Pearson later goes on to state 
that his quoted result is really “Condorcet’s and Laplace’s extension of 
Bayes.” ) 

But let us return to our muttons!*!. Recall that Bayes quite definitely 
considered only the probability of an event’s happening in a single trial: it 
seems, therefore, that we should look more closely at 


1 1 
Q(p,451,0) = | rt —aytde / f x?(1— 2)! dz 


(which is Pearson’s C9). This, however, is still not Bayes’s result — that, 
we see, corresponds in fact (taking the limits of integration x, and x2 in 
the numerator to be 0 and 1 respectively) to m = 0 = n, which looks 
rather odd, since Q(p, q;0,0) has no (nontrivial) meaning when applied to 
a future occurrence. It seems, therefore, that Bayes’s phrase “a single trial” 
does not — nay, cannot — refer to a future event: to what, then, does it 
refer? 

That this question has been viewed as occasioning some difficulty in the 
past may be seen by referring to Pearson [1978, pp. 368-369]. In these 
lectures on the history of statistics, Pearson even went so far as to say of 
Todhunter “like Price he does not show what Bayes means by the ‘single 
throw’ ” (op. cit. p. 369). That Todhunter was aware of a possible confu- 
sion (although he himself supported Price) is shown by his assertion that 
Lubbock and Drinkwater-Bethune believed that Bayes (or maybe Price) 
confounded the probabilities of Bayes and Laplace. The pertinent section 
from this little-known tract on probability (written c.1830) reads as follows: 


Bayes, or perhaps we should rather say Price, seems to have con- 
founded the probability thus determined |i.e. in Bayes’s Propo- 
sition 10], with the probability that an event which has been 
already observed m [sic] times in p + q experiments, will hap- 
pen again [i.e. Q(p,q;1,0)]. The difference between the two is 
obvious ... [p. 48]. 
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I must confess to agreeing with these authors (that the confusion was caused 
by Price and not Bayes) rather than the more distinguished historian. 

Pearson [1978, p. 369] suggests that the discussion given by Timerding 
[1908] might in fact be even more obscure than Todhunter’s. Timerding 
suggested that Bayes’s table be replaced by a box with a sliding drawer, 
with balls being dropped into the box, some falling inside and some outside 
the drawer. One would then be determining, by Bayes’s Proposition 9, the 
probability that the drawer was pushed in a certain distance. However, it is 
probably true that Timerding in fact saw no problem at all and proposed 
his model as an illuminating alternative to Bayes’s. 

If Bayes’s original experiment is recalled, it will be remembered that his 
first postulate concerns the throwing of a ball on a square level table (“at 
random”, we might say)'®?; and it is in terms of this original toss that 
“successes” and “failures” are then defined. It is to this first toss of a ball 
that Pearson [1978, p. 367] attributes the reference “a single trial”, and 
what Bayes is finding is the probability of the chance of this event’s lying 
between any two degrees of probability after p+gq further throws (or throws 
with another ball) have been made. 

Of course, all these “problems” about the meaning of the phrase “single 
trial” vanish in the light of Bayes’s own words. For in the statement of his 
second postulate (see §3.4) Bayes makes quite explicit the sense in which 
the awkward words will be used. Writing of the balls thrown upon the table 
ABCD, he says 


I suppose that the ball W shall be 1st thrown, and through the 
point where it rests a line os shall be drawn parallel to AD, and 
meeting C'D and AB in s and o; and that afterwards the ball 
O shall be thrown p+q or n times, and that its resting between 
AD and os after a single throw be called the happening of the 
event M in a single trial. [p. 385] 


Indeed, if we recall Bayes’s statement of his problem (see §3.4), we see that 
his insistence on the words “single trial” (or some equivalent formulation), 
while perhaps often unnecessary, serves to make quite clear that which is 
his concern, as contrasted with the statement, at the outset of his problem, 
about “the number of times” that some event has happened or failed. 

However, it seems to me that it might be possible to reconcile Bayes’s 
and Laplace’s investigations, in a manner that has perhaps not been sufh- 
ciently stressed before. As we have already seen (see §7.3), Laplace, in his 
discussion [1774, p. 30] of this problem, wrote 


la probabilité que x est le vrai rapport du nombre des billets 
blancs au nombre total des billets est par le principe de l’article 
précédent égale a 


r(1—2)tde | f 2(1- 2) aa 
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where the integral is taken from 0 to 1. If we integrate this expression 

from x; to 29, we obtain the probability that «x, the true ratio of white to 

total number of tickets, lies between x; and x2, given that p white and gq 

black tickets have been drawn — i.e. Pr[z, < x < £2 | p white & q black]. 

But this is exactly Bayes’s result! What Laplace went on to find (viz. our 

Q(p,q;m, n)) has, I think, been mistakenly confused with this result. 
Summarizing, one might say 


(i) Bayes found Pr [xz] < x < x2 | psuccesses & gq failures], while 


(ii) Laplace found — or, more correctly, almost found — the same thing, 
and, in addition, found an expression for Pr [m future successes & n 
future failures| p successes & gq failures] . 


I believe that neither Bayes nor Laplace confused these probabilities, yet 
this early clarity was soon lost. Pearson [1978, p. 368] in fact says 


There is great obscurity about the whole matter, but if my view 
be correct Bayes had certainly not reached ‘Bayes’ Theorem’. 


While one must agree with the first part of this statement, one might hesi- 
tate about accepting the second clause, until one remembers the expression 
that Pearson refers to as “Bayes’s Theorem” (as we have already said, this 
is our Q(p,q;m, n), which is due to Laplace and not even attempted by our 
reverend originator). 
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Laplace begins the fifteenth article of his Mémoire sur les probabilités (see 
§7.6) with the following words: 


supposons qu’un événement donné de puisse étre produit que 
par les n causes A,A’,..., A"): soient x la probabilité qui 
en résulte pour |’existence de A; x’ celle de l’existence de A’; 
xz’ celle de l’existence de A”, etc. [pp. 415-416] 


It is clear from the ensuing discussion that x, z’,... are intended to denote 
conditional probabilities given E: we shall denote Pr[A; | E] by z;. Further, 
let aj = Pr[E# | Aj]. 

Laplace next states that 


la probabilité d’un second événement semblable au premier sera 
égale au produit de a [our a] par la probabilité x [x1] de la cause 
A [A], plus au produit de a’ [a2] par la probabilité x’ [x] de 
la cause A’ [Ag], plus etc.; d’ou il suit que lon aura 

az +a'z' + an" + Saud 


pour cette probabilité ... [p. 416]. 
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To verify this, notice firstly that, under the assumption of a discrete uniform 
prior, 


Pr [Es | Ey] == Pr [Ey 2] JP [Ey] 


l| 


2, Pr [Eg | A;] Pr [Ai]/ do Pr [Ey | Aj] Pr [Aj] 


2 Pr [Ey Ez | AN ee [Ey | Aj] 


d Pr {EE | Asl/ Do as 


Assuming the conditional independence of the E’s and using the fact that 
the events are “semblable”, we have 


Pr [Ey | Fy] = S> Pr [Fy | Aj] Pr [E> | Aj] om a; 


= Lai /Yiai. 


Furthermore, 


Pr [Ee | Ey] Pr [Ey E] /Pr [Fy] 


= 9) Pr [E12 | Ai] Pr [Ai]/ Pr [Ea] 
= ))Pr[#y | Ai] Pr [Bo | As] Pr [Ai]/ Pr [£1] 
= )¢Pr[£2 | Ai] Pr[BiAi)/ Pr [E41] 
= Y°Pr[Ey | As Pr lA; | Bi] 
a ae 
ants, siden the aesumpuons 


(i) Pr[A,] = Pr[Ao] =---= Pr[A,] = 1/n, and 


(ii) #, and E> are similar and conditionally independent with respect to 
each of the A,, 


we see that 


Ss: ate Priks | hi) = ya?) a7: 


Similar expressions are derived for Pr[£3 E2 | £1], etc., under a suitable 
extension of assumption (ii) above. 
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Finally, using the fact that ye xz; = 1 and the n equations 
ae = diaz /Yiai 
Ds OF 3 Da} [Yai 


ape; = Yar /Yia 
we find that 2; = a; /)*a;, or Pr{A; | Z] = Pr[F | Aj] />> Pr{F | Ai], as 


required. 
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The Suzte to the Mémoire sur les approzimations des formules qui sont 
fonctions de trés grands nombres opens with the following words: 


Ce Mémoire étant une suite de celui qui a paru sur le méme 
objet dans le Volume précédent, je conserverai |’ordre des ar- 
ticles et des numéros. J’ai donne, dans le premier article, une 
méthode générale pour réduire en séries trés convergentes les 
fonctions différentielles qui renferment des facteurs éléves a de 
grandes puissances. Dans le second article, j’ai ramené a ce 
genre d’intégrales toutes les fonctions données par des équations 
linéaires aux différences ordinaires ou partielles, finies et infini- 
ment petites; et je suis ainsi parvenu, dans le troisieme article, 
a déterminer les valeurs approchées de plusiers formules qui se 
rencontrent fréquemment dans |’Analyse, mais dont l’application 
devient trés pénible lorsque les nombres dont elles sont fonctions 
sont considérables. Il me reste présentement a faire voir l’usage 
de cette analyse dans la théorie des hasards. [p. 295] 


8 


Poisson to Whitworth 


I hope the gentle reader will excuse me for 
dwelling on these & the like particulars, 
which, however insignificant they may ap- 
pear to grovelling vulgar minds, yet will 
certainly help a philosopher to enlarge his 
thoughts and imagination, and apply them 
to the benefit of public as well as private 


life. 


Jonathan Swift, Travels into several 
Remote Nations of the World, by 
L. Gulliver. 


8.1 Siméon-Denis Poisson (1781-1840) 


Only two works by this author seem relevant to the present study: the first 
of these is a memoir published in 1830 (read 8th February 1829), and the 
second is the book Recherches sur la probabilité des jugements en matiére 
cruminelle et en matiére civile, précédées des régles générales du calcul des 
probabilités of 1837. 

Poisson begins his memoir, entitled “Sur la proportion des naissances 
des filles et des garcons”, with some observations on the ratios of male to 
female births in various parts of France over some ten years, concluding 
this introduction with a description of Laplace’s Théorie analytique des 
probabilités as an 


ouvrage aussi éminemment remarquable par la variété des ques- 
tions qui y sont traitées, que par la généralité des méthodes que 
Laplace a imaginées pour les résoudre. [p. 243] 


The first (non-introductory) section of the work is entitled “Probabilité 
de la répétition d’un événement dont la chance est donnée.” Here Poisson 
proves the Bernoulli weak law of large numbers’, it being shown that if A 
and B are complementary events of constant probabilities p and q respec- 
tively, then the probability U that, in n trials, the number of occurrences 
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of the event A lies between the limits 
Ntuv/2(n+ 1)pq 


where JN is the greatest integer not exceeding n — p, is given by 
a 1-— | ae dt +e" | \/2nnpg . (1) 
v U 


This is followed by the statement: 


le rapport z’//n du nombre de fois que |’événement A arrivera 
au nombre total des épreuves, différera donc de moins en moins 
de la probabilité p de cet événement; et |’on pourra toujours 
prendre n assez grand pour qu’il y ait la probabilité U que la 
difference (x'/n) — p sera aussi petite que l’on voudra; ce qui 
est, comme on sait, le théoreme de Jacques Bernouilli sur la 
répétition, dans un trés-grand nombre d’épreuves, d’un événe- 
ment dont la chance est donnée 4 priort. [p. 261] 


There then ensues, in the tenth article of this section, a discussion of 
the case in which n is very large and p is very small, so that np is either a 
fraction or of moderate size. This leads to what 1s now termed the Pozsson 
distribution”, it being found that, under the above conditions and with 
pn = w, the probability that A occurs no more than z times in n trials is 


Se uk/ kl, 
k=0 


Section II, “Probabilités des événements simples et des événements fu- 
turs d’apres les événements observés” has its main thrust described in the 
first article (Article 12 of the paper) as follows: 


Jusqu’icl nous avons supposé connue @ priori, la chance p de 
Vévénement A, et nous en avons conclu la probabilité d’un 
événement futur, relatif 4 la répétition de A sur un tres-grand 
nombre d’épreuves; mais dans les applications du calcul des 
hasards aux phénomenes naturels, et particulierement dans la 
question indiquee par le titre de ce Mémoire, la valeur de p doit, 
au contraire, se déduire autant qu’il est possible, des événements 
observés en tres-grands nombres, pour servir ensuite a calculer 
la probabilité des événements futurs. C’est ce probleme qui va 
maintenant nous occuper. [p. 265] 


Supposing that p, the unknown probability of A, is susceptible only of 
values in the set {v1, v2,...,Um} with 


Rye Prlip= vy), Wel Qace pm}, 


Poisson takes V1, Vo,...,Vm as 
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les probabilités correspondantes d’un événement composé C, en 
sorte que V,, désigne la probabilité de C' en fonction de vy, qui 
aurait lieu s’il était certain qu’on elt p = vy. [pp. 265-266] 


The event C' having been observed, Poisson proposes to determine R,,, 
by, firstly, replacing each V, by N,/, where yw and the N, are integral, 
and then identifying the question with that of the drawing of balls from 
urns, with the n-th urn containing p balls of which N, are white. What is 
required is the probability that a drawn ball, found to be white (event C), 
was taken from the n-th urn, a probability that Poisson finds to be given 
by 


Rav [Vr | (2) 


This ratio of course is really Pr [p = vp, | C] rather than an absolute proba- 
bility as Poisson suggests, and indeed many of the probabilities given below 
should be similarly conditioned. 

If C’ is another compound event depending on A with 


Vi PPG’ [p= v5 | 


then the probability of C” is given by 


P= YVae= ovate | Vn (3) 


In the next article Poisson supposes p to be susceptible of any value in 
[0,1], and deduces that the formulae (2) and (3) then become 


1 
Prle<p<vtdo]=Vdv / | V dv 
0 


1 1 
pr(c= f vivdo/ f V du 
0 0 


respectively. Furthermore 


b 1 
zerre<p<il= [i ve/ | V dv, 
a 0 


and if Q is “la probabilité que l’événement C’ répondra 4 l’une des valeurs 
de p comprises entre ces limites” [p. 268], then 


a 1 
q= | vivae/ | V dv. 
b 0 


and 
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So much is standard: Poisson’s unique contribution 1s now to note that, 
if M and M’ denote respectively the maximum and minimum of V’ over 
the complement of {p:a< p< 6}, then 


Q+M'(1-2)<T<Q+M(1-Z), 
an expression that yields, for (1 — Z) negligible, 
TxQ 


“ce qui en simplifiera le calcul” [p. 269]. 
As a first illustration of the use of these formulae Poisson considers the 
case in which the event C is the occurrence of A s times in m trials. Then 


V= w(1—v)™-s , 


s 
and hence 
1 
R = v*(1-v)" iw / | vi(l—v)™~* du (4) 
0 
b 1 
Z = i vi(l—v)™-* dv vi(1—v)"™~* dv. (5) 
a ) 


The integrand achieves its maximum value G = (s/m)*(1— s/m)™7* on 
our taking v = s/m (= g, say). On setting 


(is Ge) alia = Ge 
one finds, on taking logarithms, that v = g + g't + gt? + ---, where 
g = V2(m—s)s/m> ; g =2(m—2s) [3m* ; 


and hence, on neglecting terms of order 1/m, one has 


1 reve) 
/ v'(1—v)™" dv i G e-? (du/dt) dt 
0 —0O 


= Ga'V/r. (6) 
Attention is next turned to the numerator of Z in (5) above. If one takes 
a=g—gz, b=gt+g'z, v=g+g'0 
one obtains, to the same order of magnitude as before, 


t=0—6° g"/q', 
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and hence F 
2 
evt an poe (1 a 26" 0° / 9’) 


Hence 


b Zz 
[ va-urae = / G gie~™ (14 2963 / 9’) dé 


~ 2G | eW* do. 
0 


It then follows from (5) and (6) that? 


Z = Pr{(s/m) —g'z < p< (s/m) +9'z] 


2 i — 6? 
ak dé , 
0 


while from (4) one has 


R Pr[(s/m) + 9'6 <p< (s/m) + 9'6 + dO] 


1 _4g2 93 | I 

= 1+ 29°60 dé . 

Fe (1 + 298° / 9’) 

On taking z = 3 Poisson finds that 1 - Z = 0.00002209, a quantity 


small enough to allow one to make the approximation T' x @ as mentioned 
above, where 


2 
Q= = 7 We7* (1+ 29"0°/g') do 
and where II is the function previously denoted by V’ with v replaced by 
p = (s/m) + g'@. 

In Article 15 Poisson turns his attention to the case in which C” is the 
event that, in n trials, A occurs z or fewer times, where x and n — 2 are 
very large. The problem is solved under the assumption that the difference 
between x/(n +1) and s/m is of order 1/./m, or more precisely 


x/(n+ 1) = (s/m) — 79 (7) 


where y (positive) is either a fraction or a small number. After some ma- 
nipulation it is found that 


PrW(A) < 2]=~ | oo det 


where N(A) is the number of times A occurs in 7 trials, x satisfies (7), or 


x= (n+ 1)s/m — (c/m)V/2(n + 1) (1 + a) (m-— s)s , 
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and where 


a (n+1)/m 
ya//(1 + a7) 
i = __(mts)V2_ ~c7 


34/ns(m—s)(1+a?) : 


° 
I] 


An expression is also given for the probability that N(A) falls between two 
given values. 

Comparison of this latter result with (1) is then undertaken under various 
approximations. For example, if u > 0 and quantities of order 1/n are 
neglected, 


v=1-— | got dt + me—” | Vint em m— 8) 


while if n is very small with respect to m (of a magnitude comparable to 
/m in fact), the probability that N(A) lies within the limits 


N + (u/m)V/2(n + 1)(m — s)s 


(where N is the greatest integer not exceeding ns/m) is 


v=1- |" ew dt + me~* | Viana (8) 


which coincides with (1) if p= s/m and q¢ = (m— s)/m. 


Lors donc que le nombre n des événements futurs est trés-petit 
eu égard au nombre m des événements observés, les limites 
du nombre de fois que A arrivera et leur probabilité pourront 
se calculer en prenant pour la probabilité p de A, le rapport 
s/m du nombre de fois que cet événement est arrivé au nombre 
total des observations, comme si cette valeur de p était certaine 
et donnée a priori. Mais il n’en est pas ainsi quand les deux 
nombres n et m sont du méme ordre de grandeur. [p. 278] 


Mention is also made of the possibility of neglecting the last term in (8). 

In the seventeenth article Poisson stresses the importance of ensuring 
that the assumption that the probability of the event A remains the same 
in each trial (past or future) is satisfied. This is illustrated by the drawing 
of balls from an infinite urn, an example that is then extended to take 
account of m similar urns*, and it is shown that the probability of drawing 
a white ball is )-}" p;/m. This mean value can then be substituted for p in 
the preceding formulae. 
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This essentially concludes the main theoretical portion of the memoir. 
Poisson now compares the question of births to that of different urns. The 
event A corresponds to the birth of a boy, the probability p being susceptible 
of all values from zero to one. Since p may well vary with time and place? 
(and also from one family to another), an average value of p is required, 
and 


c’est en supposant que cette moyenne ne variera pas, que l|’on 
calcule la probabilité des naissances masculines pendant un 
autre intervalle de temps. [p. 283] 


Application of the formulae developed earlier in the memoir is made to 
natal data for various periods, and it is found that 


la chance d’une naissance masculine dépend des localités, en 
sorte qu’elle varie, pour une méme année, d’une département 
aun autre, et pour un méme département, d’une année a une 
autre. [p. 286] 


In Article 21 Poisson supposes that two events A and A’ happen s and 
s’ times respectively in m and m’ trials, where s, m—s, s’ and m’ — s’ are 
all very large. If p and p’ denote the probabilities of A and A’, and if p’ 
exceeds p at least by a given quantity w, it is shown, under approximations 
of the same sort as those used earlier and when s’/m’ — s/m = w, that 


T =Prip'-p>ul==+4 ee ee 


2 y/m(1 + p?) 7 2,/m(1 + p?)8 
where \ = (m—2s) v3 [3 \/ms(m — s) (and X’ is defined analogously) and 
p° = (m'/m)°(s/s')(m — s)/(m! — s'). 


He notes further that T — § as m and m! — oo. 
If one supposes in addition that s/m = s‘/m’ then (9) becomes 


1 
Pr[p’ > p| = 5 + (mm — m!)d [Jini (nm) 
and in the case in which 
s'/m!' —s/m—w=ar/2s(m — s)/m =af 


where |a| is small, one obtains 


(9) 


1 1 cap 
Ps oe Serie aes ae | 
Pr [p'’ —p > s'/m' — s/ml] s+ | € t 


where c = pa/,/(1+p?). This formula, Poisson notes, coincides with a 
result given by Laplace in his Théorie analytique des probabilités®. 

Further application of the preceding formulae to the question of births 
follows, it being found that 
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Nous pouvons donc conclure qu’a l’époque actuelle et pour la 
France entieére, la probabilité d’une naissance masculine n’éprouve 
que de tres-petites variations d’une année a une autre, et pren- 
dre pour sa valeur, la moyenne des dix années que nous avons 
considérées, c’est-a-dire, 0,5159. [p. 307] 


This completes our study of the memoir: we now turn to Poisson’s 
Recherches sur la probabilité des jugements en mattére criminelle et en 
matiéere civile, précédées des régles générales du calcul des probabilités of 
1837. Although the major part (if indeed not all) of this work is of no 
little interest, we shall firmly confine ourselves to pertinent passages. 

After commenting on the use made by Condorcet and Laplace of Bayes’s 
Theorem in their work on the probability of judgment and testimony (to 
which work animadversion has already been made in the present treatise), 
Poisson expresses the doubts to which he was still subject on this matter 
after reading these authors, and that resulted in his approaching the matter 
from a different point of view. 


Le caractére distinctif de cette nouvelle théorie de la proba- 
bilité des jugements criminels étant donc de déterminer d’abord, 
d’apres les données de l’observation dans un trés grand nom- 
bre d’affaires de méme nature, la chance d’erreur du vote des 
juges, et celle de la culpabilité des accusés avant louverture des 
débats, elle doit convenir a toutes les especes nombreuses de 
jugements. [p. 25] 


Poisson emphasizes the role of prior knowledge as follows: 


les regles qui servent a remonter de la probabilité d’un événement 
observé a celle de sa cause, et qui sont la base de la théorie 
dont nous nous occupons, exigent que l’on ait égard a toute 
présomption antérieure a l’observation, lorsque |’on ne suppose 
pas, ou qu’on n’a pas démontré qu’il n’en existe aucune. [p. 4] 


The first chapter of this work is entitled “Regles générales des proba- 
bilités”. Poisson starts off with a precise statement of the way in which he 
will use the word “probability” : 


La probabilité d’un événement est la raison que nous avons de 
croire qu’il aura ou qu’il a eu leu. Quoiqu’il s’agisse, dans un 
cas, d’un fait accompli, et dans l’autre, d’une chose éventuelle; 
pour nous, la probabilité est cependant la méme, lorsque tout 
est d’ailleurs égal dans ces deux cas, en eux-mémes si différents. 


[p. 30] 


He further stresses the dependence of probability upon individual experi- 
ence with the words 
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La probabilité dépendant des connaissances que nous avons sur 
un événement, elle peut étre inégale pour un méme événement 
et. pour diverses personnes [p. 30], 


and he points out further that the term “probability” will also be used with 
this meaning, “chance” being reserved® 


aux événements en eux-mémes et indépendamment de la con- 
naissance que nous en avons. [p. 31] 


He further defines 


La mesure de la probabilité d’un événement, est le rapport du 
nombre de cas favorables a cet. événement, au nombre total de 
cas favorables ou contraires, et tous également possibles, ou qui 
ont tous une méme chance [p. 31], 


and he indicates later on [p. 33] the possibility of (indeed, the necessity for) 
extending this definition to incommensurable quantities. 

Poisson next points out (though of course not in these symbols) that 
Pr[E] + Pr[#] =1, and follows this with the important observation that 
when we have no reason to believe in the occurrence of # rather than its 
complement £, each should be assigned probability 7 The usual product 
rule for the probability of the joint occurrence of two independent events 
is stated, and this is extended to the observation that the probability of m 
successive happenings of the event EF is p”™ (where Pr |F] = p). The exten- 
sion to non-independent (or dependent) events is made, i.e. Pr[E & E,] = 
Pr[E] Pr[£, | E], where £ denotes the event? “qui doit arriver le premier” 
[p. 41], and expressions are given for probabilities resulting from the with- 
drawal, both with and without replacement, from an urn. In the tenth 
article we find a result that we would today write as 


Pr[A] = > Pr[A & Hj], 


and this is illustrated in the eleventh article by typical “urn and balls” 
examples?®. 
Mathematical expectation is defined (acceptably) as follows: 


Le produit d’un gain et de la probabilité de l’obtenir est ce qu’on 
appelle l’espérance mathématique de chaque personne intéressée 
dans une spéculation quelconque [p. 71], 


and this is contrasted in the twenty-fourth article with espérance morale, 
the difference being illustrated by the St Petersburg Paradox'?. 

The second chapter, occupying nigh on a hundred pages, is entitled 
“Suite des régles générales; probabilités des causes et des événements fu- 
turs, déduites de l’observation des événements passés”. Poisson begins by 
giving a precise definition of the way in which the word “cause” is to be 
used in the calculus of probabilities: 
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on y considére une cause C’, relative a un événement quelconque 
FE, comme étant la chose qui donne a l’arrivée de FE, la chance 
déterminée qui lui est propre. [p. 79] 


Furthermore, 


L’ensemble des causes qui concourent ala production d’un événe- 
ment sans influer sur la grandeur de sa chance, c’est-a-dire, sur 
le rapport du nombre de cas favorables a son arrivée au nombre 
total des cas possibles, est ce qu’on doit entendre par le hasard. 


[p. 80] 


Poisson now passes, in the twenty-eighth article, to a discrete form of 
Bayes’s Theorem. He supposes that the occurrence of an event # may be 
attributed to any one of a number m of mutually exclusive and exhaustive 
causes, all of which, prior to observation, are equally probable. The question 
is the determination of the a posteriort probabilities of these causes. If we 
denote the sequence of causes by {C,}, we have 


Wn =Pr[(C, | LE] = Eee Cle 


In the next article Poisson points out that, in finding the probabilities 
of several successive events, one ought to consider not only the effect that 
the occurrence of one has on the chance of the following event, but also 
sometimes the probabilities of the divers causes of the first event. The 
results of this article are extended in the following one to the case of an 
event £” following EF, the desired probability (under a suitable, though 
unstated, assumption of conditional independence) being given by 


w! = Pr[B' | B] = 5 Pr[B" | Ca] Pr[B | Cal/ D Pr [2 | Cal. 


Telle est la formule qui sert a calculer la probabilité des événe- 
ments futurs, d’apres l’observation des événements passés. 


[p. 87] 


In Article 32 Poisson applies his results to some simple examples. In 
the first of these (later generalized by Catalan — see §8.8) he considers 
the drawing of a white ball from an urn B known to contain m white 
or black balls. The probability w, that the urn contains n white balls is 
shown to be 2n/m(m +1), under the assumption that the possible initial 
compositions of the urn are equally probable. If now another white ball is 
drawn from the urn (event £’), the probability w’ defined above is found to 
be (i) (2m + 1)/3, if sampling occurs with replacement, and (ii) 2/3 if the 
sampling is without replacement. The case in which (m — 1) draws from m 
white or black balls have resulted in (m— 1) white balls is also considered. 
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In his next article Poisson considers the case in which m is unknown: all 
that is known is that m < 3 (say). If FE denotes the event that x white balls 
have been drawn in a series of n draws (with replacement), with 0 < z < n, 
one may suppose that any one of the following three hypotheses about the 
composition of the urn holds: 


C. one white and one black ball; 
Cy. one black and two white balls; 
C3. one white and two black balls. 


Then 
PriGy SUP RyY = 1/2" 


Pr Cs) = (2/3)7 1/3)?" * = 27/3" 


Pr [C3] = (1/3)*(2/3)"7® = 2°-#/3” | 


The probabilities wy,w2 and w3 are then easily found. If the event FE” is the 
withdrawal of a further white ball from the urn, then 


w! = [(1/2)3" + (2/3)2"t* + (1/3)22°7*] /(3" + 2° tF 4 220-7) | 


Detailed examination of the cases (i) n = 2z, (ii) e = 27 and n = 3:, 
(iil) n = 3a follows, and Poisson notes that, as the number of withdrawals 
increases, w’ tends in these three instances to 1/2, 2/3 and 1/3 respectively. 

In his thirty-fourth article Poisson considers the case in which the causes 
are not initially equally probable, expressions of the usual form for w, and 
w’ being obtained (consideration is also given to the case of the occurrence 
of yet another event #” following on the occurrence of £’, which in turn 
followed E’). The theory is followed in Article 35 by an example, and in 
the following articles application is made to the question of testimony, an 
important observation being the following: 


la probabilité d’un événement qui nous est transmis par une 
chaine traditionelle d’un tres grand nombre de témoins, ne differe 
pas sensiblement de la chance propre de cet événement, ou 
indépendante du témoignage; tandis que |’attestation d’un grand 
nombre de témoins directs d’un événement rend sa probabilité 
trés approchante de la certitude, lorsqu’il y a pour chacun de 
ces témoins plus d’un contre un a parier qu’il ne nous trompe 


pas (n° 37). [p. 112] 


In his forty-third article Poisson turns his attention to the case in which 
the number of causes to which an event & may be attributed is infinite. 
Supposing firstly that the observed event EF is the drawing of a white ball 
from an urn containing an infinite number of white and black balls, Poisson 
considers firstly the case in which the initial distribution of x, the ratio of 
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white balls to the total number, is uniform (as we would phrase it today), 
obtaining for the probability w of X the ratio 


1 
xadz | f X dz , 
0 


where X denotes the probability that x, ifit were certain, would give to the 
occurrence of F. Similarly, if E’ is a future event depending on the same 
causes as E’, with corresponding probability X’, we have 


1 1 
=f xxar/ f X dr . 
0 0 


If, on the other hand, the initial values of x are not equally probable but 
follow some distribution Y, then 


1 
x¥ de | f XY dx 
0 
1 1 
thy [ xxrvas | f XY dz. 
0 0 


In his next article Poisson shows effectively that 


vePrla<e<alel= f sa)ye)ar | f se) o(e)ae, 


where, as before, x denotes the prior probability of #/. As an illustration, 
he considers the case of sampling with replacement from an urn containing 
a vast number of balls (as many white as black), the sampling resulting in 
the obtaining of n white balls in n draws (event /). In this case f(z) = x”, 
with all possible values of x being equally probable (so that y is constant), 
and hence the probability that the urn contains more white than black balls 


will be ; ‘ 
| “ iz | | 2" dx=1—(1/2)?T? . 
1/2 0 


In Article 45 Poisson supposes that FE and E” are both events that are 
composed of the same simple event G, the chance that the probability of 
this event is x being Y dz before the occurrence of F' and w thereafter. The 
notation is Poisson’s: one might perhaps rather write the first part of this 
supposition as 


e 
tI 


Prl[a < pg < «4+ dz] = f(x) dz, 


where if f(x) dz = 1. Poisson next suggests that, according to the rule of 
mathematical expectation, one ought to take as the unknown value of the 
chance of G, before the occurrence of F’, the value 


1 
i x f(r) dz . 
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As an example he considers the case in which pg is uniformly distributed 
over the interval (0,1), obtaining y = 1/2 and w’ = 2/3 when F and E” are 
both the event G. He also deduces that the probability that G will occur 
on the second trial if it has failed to occur on the first, is 


w= f neds | [0-2dr= 3. 


In the forty-sixth article Poisson considers the case in which F is the 
event that G has occurred m times and G (or H, in Poisson’s notation) n 
times, and the future event E’ is the occurrence of G and of G, m’ and n’ 
times respectively. We then have 


ee os / “mtn! (1 — n\n! Fy) da / i “2™(1— 2)" f(e) de 


m!' 


(here the events in # — and in £’ — may occur in any order whatsoever). 
In the case in which f(-) is a uniform density, Poisson shows that 


ee m+m\ (n+n! m+m+n+n'+1 
7 m n m+n+1l 
which reduces, when n = 0 = n’, to 


w' =(m+1)/(m+m' +1). 


There follows, in Article 47, a consideration of the case x = r+z, where 
r should be taken equal to y. On letting m’ = 1 and n’ = 0, and changing 
the variable of integration from zx to z, Poisson shows that, neglecting terms 


of small order, 
w =rt{[m/r—n/(1—r)h 


where h = ce f(z) 2? dz. In the subsequent article he uses this result to 


compare the probability of two similar outcomes (i.e. both G or both G) 
with that of dissimilar outcomes. 

Poisson now passes to the statement of his generalization of Bernoulli’s 
law of large numbers?? (its proof being postponed to his third chapter), and 
illustrates its use with some numerical data from Buffon’s Essa: d’arithmé- 
tique morale. 

The third chapter of Poisson’s work is entitled “Calcul des probabilités 
qui dépendent de tres grands nombres”. In Article 71 the rule of succes- 
sion receives further attention: assuming that F and F are complementary 
events with constant but unknown probabilities p and q, Poisson supposes 
that in uw = m+n trials EF and F have occurred m and n times respectively. 
The probability U’ that in yp’ further trials F and F will occur m’ and n’ 
times respectively is given by!* 


yl! = m+ m! es ore 
a m n ptl J. 
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Using the approximation 
nin ne "V2rn , 
Poisson shows that 


me Hemme (nt ntyrtm (w+ 1)4 
: mma at + Ly 
/ / 
where H = ane and 


re th (mt m' (n+ nut) 
pt ul + mn(utpi+ 1) | 


Under the further assumption that m’ and n’ are very small in comparison 
with m and n, Poisson deduces that 


m +n'\ (fm ee a tak 
Pee dia) Ge 
yi P 


which agrees with the usual binomial probability for the occurrence of 
m E’s and n’ F’s in (m' +n’) trials, where # and F have the given a 
priort probabilities p = m/p and q = n/p. Poisson notes further that this 
pleasing state of affairs ceases to obtain if m’ and n’ are of comparable 
magnitudes to m and n: indeed, in this case we obtain the (approximate) 


probability 
1 (" + _ (=) (2) 
TTrh 


where m’ = mh,n' = nh and wp! = ph. 

Several sticles of the third chapter are devoted to Bemoulle s Theorem 
since we shall need to refer to Poisson’s statement of it in the near fatute: 
we shall state it here. Thus let p and q be the known and constant chances 
of the events F' and F; let y (a large number) trials be made?®, and let N 
be the greatest integer not exceeding pq. Further, let u be a quantity such 
that u,/2(y4 + 1)pq is an integer that is very small in relation to N. If m 
and n are the numbers of occurrences of F and F in the sp trials, then!® 


Pr [NV —ur/2upqg<n< N+uy 2up4| 


j 


14. 


R 


2 


2 A age 1 
aly ~" dt + oe 
: =f : Jim « (10) 


Various developments are also given: for example, n may be restricted to lie 
in a half-open or open interval, or one may neglect the difference between jug 


and N, in which case it is found that (10) yields Pr[|n/p — q| < u./2pq/p). 


(I 
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Poisson turns his attention in his eighty-third article to the case in which 
the probabilities p and q are not given, although the ratios m/y and n/p 
have been observed. In this case, he states, 


les formules que nous avons trouvées feront connaitre les valeurs 
tres probables et tres approchées des inconnues p et gq. [p. 209] 


It thus follows that (10) yields the probability 


Pr[lp — m/p] < ur/2pq/p] 


(where it should be stressed that p and qg are unknown). If FR differs only 
slightly from 1, the terms p and q under the root sign may be replaced by 
m/p and n/p respectively, yielding 


Pr[|p—m/p| < (u/p)f/2mn/p}] = 1-—e f ev dt+e~" \/n/2amn (11) 


Poisson points out further that the same result!’ may be used in con- 
nexion with future events. Thus, if m,n and yp are large, if p’ further trials 
result in m’ occurrences of / and n’ occurrences of F,, and if p’, although 
small in comparison to yp, is still very large, we have 


Pr [|n'/u! — mpl < (u/p) /2mn/p 


Finally, Poisson concludes this article by noting that the results obtained 
so far by an inversion of Bernoulli’s Theorem, are inapplicable if » and p’ 
are of comparable magnitude: he proposes therefore to consider another 
line of approach. 

Once again it is supposed that F and F, of constant probabilities p and 
q, have occurred m and n times in a large number pp = m+n of trials. 
Then m/p and n/p may be taken as the approximate values of p and q. 
Now in his proof of Bernoulli’s Theorem Poisson had shown that 


] —t? e"'(q P) 
= alge ee poe ee 1 
Q = Pr |n < pg r 2upa| <a fe dt + 75 ; (12) 
r+é 


where 6 = (p— q)r7/3,/2(4 + 1)pq. The left-hand side of (12) is, of course, 


to be regarded as a function of known p and q: in the present discussion, 
however, Poisson glibly passes from this to 


PE E > n/t r/2pq/n| 


8.1 Siméon-Denis Poisson 299 


where q is “la chance inconnue ... de l’événement F” [p. 211]. Replacing p 
and q on the right-hand side of the last inequality by their limiting values 
m/p and n/p, he obtains 


Pr lg > n/t (r/u)/2mn/p] = 


It then follows that 


SE ar = Pr [n/t (r/u)VOmn]u <q < n/p (r+ dr)/p)/2mn/al 


or, as Poisson has it, 


2 exprimera la probabilité infiniment petite que l’on a pré- 


q = n/t (r/p)/Imn] 


pour toutes les valeurs de r positives et tres petites par rapport 
zi, comme le suppose |’expression de Q. [p. 211] 


cisément 


Similarly, from 


QQ! 


UI 


Pr |g > (n/m) ~ (r'/u)\/2mn/n| 


1-7 edt te" (q—p)/3\/2aup9 , 


ri- é/ 
it follows that es dr’ equals the probability that 
q=n/p—(r'/p)V2mn/p . 
Here 6’ = (p— q)r’*/3,/2upgq . 


Working from the integral forms of Q and Q’, and replacing p and q by 
m/p and n/p in both 6 and 6’, Poisson finds that 


dqQQ sl Lye call 
=o. = ie 2(m—n)r° e /3\/2rumn 


II 


BO is slag? wiley ls 6 “" 13,/2mpmn 
dr! JT ‘ 


on neglecting terms having p as divisor. It follows that the expression!® 


1 


Benes —vt aa 3p? 
ee 2(m—n)v” e /3\/2mpmn 


ylelds V dy for the probability that ¢g = n/u+(v/p)/2mn/p. Since p = 1—q 
and m = pp — n, it is immediate that this expression is also the probability 


that p= m/p— (v/p)/J/2mn/p. 
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Todhunter [1865, art. 997] has pointed out that 


: 2 oil 
Vdv= — | e* dt, 
[ VT 0 


which is different from the result given in (11) above. He finds it “curi- 
ous” that Poisson failed to comment on the difference between the results 
obtained by these two methods: however, since the results require the sub- 
stitution of m/u for p and n/y for q at different stages in the proof, one 
need not be surprised that different answers are obtained — especially since 
both are in fact approximations. 

To conclude this article Poisson considers the finding of the probability 
of a future event #’, an event that consists in the occurrence of certain 
numbers of #’s and F’s. Denoting by II the probability of E’ for given val- 
ues of Pr[E] and Pr|F'] (so that II is a function of p and q), he concludes 
that the desired probability is given (approximately) by 


w= /uVvdr, 


and he notes further that 


Ce résultat s’accorde avec celui qui a été obtenu plus directe- 
ment, dans le second paragraphe de mon mémotre sur la pro- 
portion des natssances des deur sexes. [p. 214] 


Having discussed the relevant theory, Poisson now turns his attention to 
some examples. In his eighty-fifth article he considers the probability IT’ 
that yu’ further trials will yield m’ occurrences of # and n’ occurrences of 
F’. Under the assumption that the ratio m’ : n’ is approximately the same 
as that of m:n, he deduces, in a manner similar to that used earlier, that 


n= S20" f exp( —v? — p? y?/2m'n') dv 


where U’ = 4/p!/2am‘n'. On replacing m’ and n’ by mA and nh, and 


neglecting sufficiently small terms, one finds that 
/_ 1 Uy' 
Vith © 


a result that should be compared with that given in Article 71. 

In the next article Poisson considers, in the notation of the preceding 
example, the probability TI’ that |n'/p’ — n/p| does not exceed a//p’, 
where a is defined by m’ = mh—a,/p’ (the details will not be given here). 
The matter is pursued further in the following article. 

In Article 88 Poisson turns his attention to a further development of the 
theory!®. Suppose that the complementary events # and F’, of unknown 
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chances p and q, have occurred m and n times in a large number p = m+n 
of trials. Suppose further that the complementary events EF, and F, of 
unknown chances p; and q,, have occurred m, and n, times in a large 
number py = m, +n, of trials. The aim is to determine the probability 
of an inequality between p and p,, g and q), corresponding to differences 
between the ratios m/p and m,/p1, n/p and ny/py. Setting 


my/p1—m/p=s, 


Poisson shows, by approximations similar to those carried out before, that 
for € positive and small, 


A = Pr[p, >p+d= | VV, dv dr, , 


where V is the expression introduced earlier, and V; is the correspond- 
ing formula mutatis mutandis. Neglect of the second terms in V and Vy 


occasions tf 
—— ~ | f exp(-v? - 09) dy dv, . 
He 


After various substitutions Poisson obtains 


=z | ent dt , ife-6>0 
1 =f eat ife-6 <0 
Mae 


(e— 6) ue pr /e 
Jf 2(u3min, + uemn) 
Indeed, the neglect of the second terms in V and YV, in fact results in 
A= Pr [pi >ptel. 

As an illustration Poisson considers in Article 89 the results of a coin- 
tossing experiment carried out by Buffon?°: once again we shall omit the 
details. The remaining articles of this chapter are devoted to the question 
of repeated draws, without replacement, from an urn containing white and 
black balls, the draws resulting in long sequences of lengths yp, p’;.... The 
preceding theory is applied to find the probability that the number of tri- 
als in which the number of white balls drawn exceeds the number of black 
balls drawn, lies within given bounds, an expression of the form (10) being 
obtained. An application to elections follows. 

In his fifth chapter, entitled “Application des régles générales des prob- 
abilités aux décisions des jurys et aux jugements des tribunaux”, Poisson 
makes use of the Bayes-type formulae developed in earlier chapters??. 


where 


—— z 
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Starting off in Article 114 with the simplest of situations, he examines the 
case of a single juror. He supposes that the probability k that an accused, 
on arraignment or indictment, is guilty, is based on preliminary information 
and the subsequent accusation. On denoting by u the probability that the 
juror is not mistaken in his decision, we find that the probability y that 
the accused will be convicted is given by 


y=ku+(l1—k)1—u). 


Then the probability that he is guilty, given that he has been convicted, is, 
by Article 34, 
p= ku/(ku+(1—k)(1—)]. 


Similarly the probability ¢ that he is innocent, given that he is acquitted??, 
is 


g=(1—k)u/[A—k)u+k(1—u)]. 


Further developments of this single-juror case follow: they need not be our 
concern. 

In the next article Poisson broadens the scope of his investigation to 
encompass the addition of a second juror. The concern here is to deter- 
mine Pr [CC], the probability that the accused is convicted by each juror, 
Pr [CiC2 V C103], the probability that he is convicted by one and acquit- 
ted by the other, and Pr [C1C 9], the probability that he is acquitted by 
both. Now Pr [C1 C.] = Pr[C,] Pr [C2 | Ci], and, as in the preceding article, 


Pr{(Ci]J=en=kut+d—-k)d—w), 


the subscript referring to the first juror. For the second juror, however, k 
is replaced by p; (= p of the preceding article), and hence 


Pr [Co | Ci) = yo = pite + (1— pi) (1 — ua) . 
It thus follows that 
Pr[C, Co] = k ujug+ (1 —k) (1 — 1) (1 — ue) . 
A similar argument showing that 
Pr[C,Co] =k(1— ui) (1 — ue) + (1 — A)urue , 


it hence follows that the probability that both jurors arrive at the same 
conclusion is 


Pr [C1C2 V CiC%| = uju2 + (1 — ui) (1 — ua) 
(independent of &). Further argument in the same vein yields 


Pr [CC 2 V C1C| = (1 — U1) U2 + cl — U2) Uy. 
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The probability Pr[G | C1C | that the accused is guilty, given that he 
has been convicted by both jurors, will be, by an argument similar to one 
used in the previous article, 


Pr[G | C1 C2] = puz /[pu2 + (1 — p) (1 — ue2)] , 


and similarly 


Pr[G|CiC2] = que /[que+ (1-49) (1— u2)] 
Pr(G|CiC2] = (1—q)ue/{(1—q)ue+q(1 - u2)] 
Pr{(G[CiC,] = (1-—p)ue/[—p)us+p(1— us)]. 


On writing these last two expressions in terms of k,u, and ue one finds 
that Pr IG | C105] =k and Pr [G | C1C2| = 1—k when zw; = ug, as might 
well be expected. 

In the following articles similar results are obtained for more than two 
jurors: we shall not present intermediate results here, but shall pass imme- 
diately to the case in which the accused is convicted by at least (n—17) votes 
and acquitted by at most z, when (n — 7) and 7 are very large?*. Denoting 
this probability by cj, and letting d; be the probability that the accused is 
acquitted by at least (n — 1) votes and convicted by at most 2, we have, as 
in the preceding cases, 


cq = kU; +(1-k)U; ; dj=kV;+(1—k)V; 
where 
US ( ora —u)? and Y= ( ‘a — uy? Jul, 
da; 


The methods of approximation introduced in Article 77 yield 


a aa [| Paete” (ns iyv3 [sani 


V; 


os =| e* dete" (n+ i)v2 /3,/anin—) , 
where @ > 0 satisfies 

6? =i Infi/v(n+1)]} + (n+1—i)In[(n41-—-i)/u(n+1)] 
with y= 1—u. 


Various developments of these formulae follow: once again they need not 
concern us. 
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Poisson begins his Article 124 with the following observation: 


Les formules précédentes donneraient les solutions complétes de 
toutes les questions relatives 4 l’objet de ce chapitre, si avant 
le jugement, la probabilité & de la culpabilité était connue, et 
que l’on connut aussi, pour chaque juré et dans chaque affaire, 
la probabilité qu’il ne se trompera pas; ou bien, si cette chance 
de ne pas se tromper a plusiers valeurs possibles, il faudrait que 
toutes ces valeurs fussent données, ainsi que leurs probabilités 
respectives; ou bien encore, quand ces valeurs sont en nombre 
infini et ont chacune une probabilité infiniment petite, il serait 
nécessaire que nous connussions la fonction qui exprime la loi 
de leurs probabilités. [p. 345] 


In an attempt to eliminate these unknown elements Poisson supposes 
in Article 125 that the jurors have the same chance of being mistaken, a 
chance U that has probability density function y. It follows that 


A; =Prle<U < #] 


_k i u®—*(1 — u)'p(u) du + (1 — k) i ui(1 — u)"-*y(u) du (13) 
k fo ur-i(1 — u)ip(u) du + (1 —k) fo ui(d — u)?-*y(u) du 
Various special cases of this formula are then considered’*: in the case in 
which n = 27, A; is seen to be independent of k, as in the case in which 
y(1—u) = y(u) and # = 1-£ with ¢< §. 
The a posteriort probability of guilt is found in the following article to 
be 


k Ve u®-*(1 — u)’y(u) du 


2 ee. 
kf, u™-*(1 — u)'p(u) du + (1 — k) fy wl — u)"-*y(u) du 


Gi 


and this reduces to k when n = 27 or y(1 —u) = y(u). Analogous formulae 
are given for the case in which one knows merely that the accused has been 
convicted by a majority of at least m, or (n — 27), votes. 

In Articles 128 and 129 Poisson passes to the case in which (n — 2) and 
? are very large numbers. After various approximations he arrives at 


My = Pr I(r ~i)/n — 6 (in —1)/n3 < U <(n—1)/n + 6V2i(n — )/n3| 


1 ? ; 
ae Gey a dz , 
ef 


where 6 > 0 is very small in relation to ,/n and 


Ci = ky ((n — 1) /n) /[hy ((n — 7) /n) + 1 — ky (i/n)}. 
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In the next two articles, with much labour, Poisson derives the following 
approximate results: 


(i) the posterior probability of guilt, after conviction by at least (n — 7) 
votes, 1S 


zak [ ouau/le f owdus crt ["o(uyae 


where a = (n —1)/n; 
(ii) the probability, in conviction, that the chance U lies between a and 


1 — @ (i.e. between (n — i)/n and i/n), is 


Y;=K (< | e~™ dx + _ a Jf 2i(n — i)/mn , 
where 
wre AC eal A eee 
k fi p(u)dut(1—k) fo * pu) du 
and c > 0 satisfies (n — i)'i"~! = i#(n — i)"~fe-© 


Noting in the next article the need for some specific choice of y, Poisson 
writes 


L’hypothése que Laplace a faite pour cet objet, consiste a sup- 


poser que la fonction yu soit zéro pour toutes les valeurs de 


u moindres que 5 et qu’elle ait une méme valeur pour toutes 


celles de u qui surpassent 5: [p. 363] 
Under this assumption the formula (14) becomes 
k ae u®—*(1 — u)' du 


Se 
k fi) un-i(1—u)idu+(1—k) fo’? ur-i(1 — uj du 


0 


a result that coincides with that given by Laplace?® when k = oT and that 


ylelds | 
7=0 


1G 
J 


Similarly, if 0 < 6 < 5 k 
becomes 


(1/2)+6 . 1 . . 
Ae i u”—*(1— wu)’ du /| u"*(1—u)' du. 
(1/2)—5 0 


Some general remarks and numerical examples conclude the work. 


iL p= : and / = 5 + 6, the formula (13) 
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8.2 John William Lubbock (1803-1865) & 
John Elhot Drinkwater-Bethune 
(1801-1851) 


An undated and anonymous tract, On Probability, was published in the 
early part of the nineteenth century “under the Superintendence of the So- 
ciety for the Diffusion of Useful Knowledge”. There seems little doubt now, 
however, that this slim volume was the work of Lubbock and Drinkwater- 
Bethune’, and from Example 9 (concerned with the odds on certain horses 
winning the Gold Cup or the St Leger) we can place its date?” as post May 
1828. 
In their introductory remarks the authors state that 


It is usual to apply the word belief to the past, and the word 
expectation to the future; but the theory of probability is in all 
respects the same, whether it be applied to past or to future 
events. [art. 3] 


If Shafer [1982] is correct, then Pr[H; | #2] and Pr[E2 |] (where EF, 
and Ey are two events with Ey subsequent to £)) require rather different 
consideration, and this perhaps casts some suspicion on part of the above 
quotation. 

The authors define “probability” in the usual way, though a certain mea- 
sure of subjectivism appears. The definition runs as follows: 


the probability of any event is the ratio of the favourable cases 
to all the possible cases which, in our judgement, are similarly 
circumstanced with regard to their happening or failing. 

[art. 4] 


A basic constituent of modern subjective probability is coherence?®. We 
can perhaps see a precursor of this concept in the following sentence: 


Since the sum of the probabilities of any number of conflicting 
events is equal to unity, we have an equation of condition?’ be- 
tween the odds; and whenever they do not satisfy this equation, 
it is possible to bet with the certainty of gain. [art. 13] 


We now come to that part of the tract devoted to Bayes’s Theorem. At 
the outset Lubbock and Drinkwater-Bethune give a precise definition of 
the probability of a hypothesis as “the number of cases which favour this 
hypothesis divided by the whole number of cases possible” [art. 45]. They 
next derive a discrete form of Bayes’s Theorem, stating it as follows: 


The probability of any hypothesis is the probability of the ob- 
served event upon this hypothesis multiplied by the probability 
of the hypothesis antecedently to the observation divided by 
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the sum of the products which are formed in the same manner 
from all the hypotheses. [art. 45] 


This is followed by a “bag and balls” example in which the ratio of the 
number of white balls to the total number of balls (white or black) may be 
any of the quantities x,2z,...,ix with equal probability 1/z. The authors 
deduce the posterior probabilities of these hypotheses after one white ball 
has been drawn, and also find the probability that a future draw will yield 
a white ball (sampling occurring with replacement). By a rather laborious 
argument’, involving the expansion of exp(kx) for various values of k, 
this latter probability is shown to be (21 + 1)z/3, which, for large 7, is 
approximately 2i7/3. This in turn “if... the ratio of the white balls may 
be any ratio between 0 and unity” [art. 47], becomes 2/3, a result that is 
of course more readily obtainable by considering 


1 1 
i “de | | zrdz. 
0 0 


Lubbock and Drinkwater-Bethune next consider the case in which the ra- 
tio of white balls to the total number may be any one of Az, 2Az,... ,1Az. 
It is shown that, if m white and n black balls have been drawn (in any 
order), the probability of drawing a further m’ white and n’ black balls 
becomes, in the limit as Az — 0 and iAx — 1 (and under an assumption 
of equi-possibility), 


Pa ant\ pl | 1 
(" vial 27m (1—a2)"t" dz /| z™(l—2)" dz. 
Mm 0 0 


These integrals are then evaluated, and the result obtained is illustrated 
by that hoary example of the probability of the sun’s rising once more, if 
it has risen 2,000,000 times. 


This probability, which is already very great, must be very con- 
siderably increased, if the discoveries of physical astronomy are 
taken into account. fart. 48] 


The authors now pass on to note that if p+ q draws have resulted in p 
white and q black balls, the probability of a further white is (p+ 1)/(p+ 
q+2), that of one more black being (¢+1)/(p+q+2). Furthermore, these 
fractions approximate more and more closely to p/(p + q) and q/(p + q) 
as p and q increase, an observation that is stated to be the “converse of 
Bernoulli’s theorem” [art. 49]. 

As an illustration of the foregoing theory, the case is considered of an 
individual who has made (m + n) assertions, of which m were true and 
n false. The probability of his telling the truth in a further case is then 
v =(m+1)/(m+n-+ 2). If p denotes the a priori probability of the event 
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whose happening is asserted, then the probability that the event did occur 
given that the witness asserts it occurred, is 


po/[pu + (1- p)(1—»)I, (15) 


a fraction that is greater than p when v > ;: This remark is extended to 
the case of (n+ 1) individuals reporting independently, it being shown that 


the assertion of the (n + 1)-th individual increases the prob- 
ability of the event arising from the testimony of the other n 
individuals, only when his veracity is greater than 5. [art. 49] 


Finally, in this matter, it is pointed out that when there are no data by 
which the veracity v of the individual may be determined, the expression 
(15) should be replaced by 


/ pu/[pu + (1— p)(1 — v)] dv. 


This is illustrated by a juratorial example. 

An extension is now made to sampling with replacement from a bag 
containing balls of 2 different colours. If m,,mo,...,m,; balls have already 
been drawn, the probability of drawing a further n;,no,... ,n; balls (again 
under an “equally probable” a priort assumption) will be 


Nichi Nn; my+n; / [T m; 
oe Bar oe "5 dx 
( Cs LT )/ [Ms «/ IIe; . 


j=l 


where dx = dx, dr2...dz;-1, >> 
are taken over the set 


i xz; = 1 and the (z — 1)-fold integrals 


tty sas ei): 02; 6 Lea Se eee, j€{1,2,...,7-—1}}. 


Evaluation of the Dirichlet integrals (see Whittaker and Watson [1973, 
§12.5]) yields*} 


C(m; -+ 1 C(m; +1 
ane I] (m; +n; +1) re (m; +1) 
Thy ee ey MUG t a . a ; 
(mj +0 +8) (my +3] 
1 1 1 


Hence, as a special case, one finds that the probability of one further trial’s 
yielding a ball of the r-th colour is*? 
(m, + 1) /(my +--- +m; +2) 


where r € {1,2,... , 2}. 
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In section 33 of his Mémoire sur les probabilités of 1778 Laplace gave 
a multinomial generalization of the Bayes prior: consider the outcomes 
X1,X2,...,Xn of ak-category multinomial with prior p = (pi, po,.-.- , De). 
Then the probability of the frequency count n = (nj, 79,...,%) 1s 


Pec ttee saat | Tet ar) | 


the integral being taken over {p: es pi = 1}, with dF(p) = dp, ...dpg-y. 
As in the case of the binomial distribution, it follows, as Lubbock and 
Drinkwater-Bethune show, that 


Pr(Xn4i € ith category |n] = (n; + 1)/(N +4). 


This same result was justified by W.E. Johnson [1932] in a manner similar 
to that advanced by Bayes in his Scholium, and it in fact follows (see Zabell 
[1982]) from Johnson’s Sufficientness Postulate, viz. 


Pr[Xn41 € ith category | n] = f (n,n) , 
that there exists « > 0 such that 
f(nin) = (5 + 4)/(N +k), 


a formula whose connexion with the continuum of inductive methods dis- 
cussed by Carnap [1952] is evident. 

The multinomial generalization of the rule of succession mentioned here 
appeared, albeit hazily, in the first edition of Laplace’s Theorie analytique 
des probabilités of 1812. We have already glanced at the relevant passage 
in §7.15.3: the full reference runs as follows: 


Si on concoit une urne renfermant infinité de boules de plusieurs 
couleurs différentes, et qu’ aprés en avoir tiré un grand nombre 
n, p sur ce nombre, aient été de la premiere couleur, q de la 
seconde, r de la troisieme, etc.; en désignant par z, x’, x”, etc. 
les probabilités respectives d’amener dans un seul tirage, une de 
ces couleurs, la probabilité de l’événement observé sera le terme 


qui a pour facteur 2?.2/!.2"". etc., dans le développement du 
polynome 

(2ta'+x2" + etc.)”, 
ot lon a 


e+a'+a"+etc. = 1, 
pt@atrtetc. =n; 


i ee T 
on pourra donc supposser ici y = z?.2/".2"". etc.; et alors on 


a pour les valeurs de x, x’, x”, etc. qui rendent |’événement ob- 
serve le plus probable, 
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Ainsi les valeurs les plus probables sont proportionnelles aux 
nombres des arrivées des couleurs; et lorsque le nombre n est 
un grand nombre, les probabilités respectives des couleurs, sont 
a trés-peu pres égales aux nombres de fois qu’elles sont arrivées, 
divisés par le nombre des tirages. [1812, p. 369] 


A version more accessible to the common man was given by de Morgan in 
his Essay on Probabilities of 1838. After giving the usual rule of succession, 
de Morgan supposes that an event A has occurred m times and an event B 
n times on m-+n occasions, and that “the next event may be either A or B, 
or a new species” [1838, p. 66]. Then the probability of A “against either 
B or the new event” (loc. cit.) is m+1 ton+2, or (m+1)/(m+n +43). 

In Article 80 Lubbock and Drinkwater-Bethune mention Bayes’s Essay 
and correctly state the main result; viz. the probability that the happening 
of an event, which has already occurred p times in (p+q) experiments, has 
a probability between A and a (A < a) is 


[oma-atae/ [aa -2yde. 


They go on to make the astute observation that 


Bayes, or perhaps we should rather say Price, seems to have 
confounded the probability thus determined, with the proba- 
bility that an event which has been already observed mi [sic] 
times in p+ q experiments, will happen again. The difference 
between the two is obvious. [art. 80] 


This remark, as we have mentioned in an earlier chapter, is not accepted 
as correct by Todhunter [1865, art. 551]. 

In 1830 two papers by Lubbock on annuities were printed in the Trans- 
actions of the Cambridge Philosophical Soctety, the first of these (and the 
only one to concern us here) also containing some thoughts on probability. 
In the second article of this paper it is supposed that, from a bag contain- 
ing a number (possibly — or necessarily — infinite) of balls of p different 
colours, m, + mz +--+ mp, are drawn, where m, balls are of the first 
colour, m2 are of the second, etc. If 2; denotes the probability that a ball 
of the i-th colour is drawn in one trial, then the probability of the given 
event 1s 

ey Koes a on 
multiplied by the coefficient of this term in the expansion of 
(a Da eee i aa 


If this event is observed, “the probability of this system of probabilities” 
[p. 144] will be 


 iaike ied ae ( * a —_—r ITs mij dx , 
i eee M1,...,Mp 
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with dx = dz, dz2...dzp_;, the integration here, and elsewhere, being over 
the set 


{(t1,...,@p): O< a, <l—a,—-----—a_y1, 1€ {1,2,...,p—1}}. 


It follows as usual that the probability that n; + ---+ n, subsequent 
trials yield n, balls of the first colour, ng of the second, etc., will be 


Nis een = - 
P bs sr gmitnri dy Jo] gt dx . 
pease 7 / : / II 


Lubbock’s evaluation of this last probability is easily seen to be expressible 


as il ae ~ /(™ + — _ ‘ | (16) 


1 
where M = 5“) m; and N = 5% ni. 

Several useful observations follow: firstly, “this probability is the same 
as if the simple probability of drawing a ball of the p*® colour were Mp +1, 
with the difference of notation” [p. 146]; secondly, ifn, = 1 and all other 
n; are zero, the chance that a ball of the p-th colour is drawn is 


(mp + 1)/ (m1 +--+ My +p); 


and thirdly, the probability that the index of the colour drawn lies between 
(n — 1) and (n+ q+1) is 


(mn + Mngi t+ Mngq +9) /(™M1 + M2 +--+ + Mp, + Pp). 


While the second and third of these observations are correct, the first 
requires some attention. A similar remark in Lubbock and Drinkwater- 
Bethune’s 1830 tract (for p = 2) shows that the expression m, +1 should 
in fact be (mp + 1)/(M +p). For then the probability that N draws yield 


N1,M2,...,Mp balls is given by the multinomial probability 
7 ee eae nytetn 
¥ 8) (my + 1)" oo alrmy +1) [my oo mp +B) sa 
loery P 


which corresponds, with the substitution of square brackets for the paren- 
theses, to Lubbock’s formulation of (16) as 


& a 7 [my +1]... [mp + uy / [ray +--+ my + plete? 


Ny1,---,Np 


where [x + 1]” = (@+ 1)(a@ + 2)...(2@ +n). 
Several applications to annuities follow: we shall not pursue these here. 
Attention 1s also paid to four problems: we shall briefly consider all of these. 
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The first problem runs as follows: find the probability of getting n, balls 
of the first colour in nj +N (further) trials, the colour of the other N balls 
being anything other than the first colour. The solution is given as 


1 wag [mm + 1] [m2 +--++ mp, +p 1]% 


a ee srs oii par 


) 


ogi (i) meee 
N+) [m2 +---+mp+N4+p—1)rmtmti 


“which probability, as before, is the same as if the simple probability of 
drawing a ball of the p'® colour were mp + 1” [p. 148] (see my previous 
remarks). 

In his second problem Lubbock supposes that M = mg+m3+---+mp)+ 
p— 2, with ny: N ::m,:M. What is the chance that the number of balls 
of the first colour in n; + N trials lies between the limits n; and n; + z? 
To solve this Lubbock uses a result from Laplace’s Théorie analytique des 
probabilités, and the solution obtained can be written as 


[fH f° 2 
9 =| erties de. 
2a J, 


_ (M + m,)? 
m,M(N +n1)(M+N+m,+71) 


In the third problem it is supposed that the “law of possibility” of z, 
IS Yp(xp) (i.e. no longer necessarily uniform). How are the earlier results 
modified? Not very much can in fact be said, since no specific form of yp is 
assumed. However, in considering the application of his result to annuities, 
Lubbock writes 


ny 


where 


If the probability of life were known at a great many places, 
and if z,, were the value of z, at qi places, zp), at q2 places, 
é&c. the law of possibility might be determined approximately 
by considering ~p2p as a parabolic curve, of which z, 1s the 
abscissa passing through the points, of which the ordinates are 


q1 q2 
Qtgqet&e. > gqitqet &e. - 


, [pp. 149-150] 


The final problem is devoted to the finding of the probability of any fu- 
ture event when the results of the preceding trials are uncertain. To answer 
this question, Lubbock supposes that m draws from a bag containing only 
black and white balls have been made, and that e, (f,) is the probability 
that a white (black) ball was drawn at the nth trial. The given argument 
runs as follows: 
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First let e1,€2,...@n be all equal, and let z be the probability 
of drawing a white ball. If a white ball was drawn every time in 
the m trials which have taken place, the probability in ny +n, 
(sic) future trials of having n; white balls, and nz black balls, 
is 
(ny + ng)(n4 + ng 1) — (ny + 1) Ue ae as a) dx 
1.2...n9 (eed 


But the probability that a white ball was drawn every time is 
e™: therefore, the probability of drawing a white ball n, times, 
and a black ball n2 times on this hypothesis, multiplied by the 
probability of the hypothesis, is 


(ny + ny)(ny +ng— 1) - (m4 + 1) -m aes Gl _ i dx 
1.2...n9 fa™ dz 


and the probability of drawing n; white balls and nz black 
balls will be the sum of the probabilities on every hypothesis, 
multiplied respectively by the probability of the hypothesis. . . 
[p. 150] 


(Obvious misprints have been corrected.) Being uncertain as to whether z 
and e are supposed to be the same here, I append the following argument. 
Suppose that e; = eg = --- =e, =e, with f = 1—e. Let (r,s) denote the 
event that r white and s black balls have been drawn. Then the probability 
that n; +n future draws will yield n; white and nz black balls given that 
m trials have resulted in m white balls is 


. 1 i 
Pr[(m1, n2)|(m, 0)] = gu gf ets £N2 de /{ ede . 


Pr[(m,0)] = e™ , 


But 


and hence 


1 1 
Pr[(m1, 22) A (m, 0)| = e™ * a ig i eet rade ji e™ de . 
od 0 0 


On deriving similar expressions for Pr[(n1,n2) A (7, s)| as (r,s) runs 
through the set {(m—1,1),...,(0,m)}, we find on summing all such ex- 
pressions that the probability Pr[(n,,n2)| that n; + nz future draws will 
yield n; white and ng black balls is 


m 1 1 
& + . S- eI fi i eMtni-j pnats de /| eI fi de 
st 47=0 J ° : 


314 8 Poisson to Whitworth 


= t ” (m+ 1)! 


hy (m+n, +n, +1)! 


ss (aa ee DGD OD 


Pere (m — 3)! 3! 


1 ene —] } — 
(rater EOC CT ew 
ny + no j=0 J m—J J | 
Lubbock now ingeniously notes that the series in (17) is equal to 


dmitna 


dxz™ dy”2 : (18) 


2m y"(ex + fy)” 


(1,1) 
and remarks further that this derivative is equal to n,!n,_! multiplied by 
the coefficient of h"!k”? in the expansion of 
(1+ h)"(1+k)"2(1+ eh+ fk)™. 
This in fact is equivalent to saying that the derivative in (18) is equal to 
qmitn2 


(0,0) 


Easily identifying the above-mentioned coefficient, Lubbock concludes that 


(ny +n2)!\(m+ 1)! e d @ ( ny i cea 
Pr{(n1, n2)| = ————-——— ; e! 
I(r, m2)] (m+ ny + ng + Lj j » kk} \j-k/\k J 
This result is then extended to the consideration of balls of p different 
colours, the usual result being obtained*?. 
This example is followed by an application to the veracity of witnesses: 
we shall not pursue the matter further. 


8.3 Bernard Bolzano (1781-1848) 


In 1837 Bolzano’s Wissenschaftslehre appeared. Here the definition of log- 
ical probability proposed by the author is seen as being in complete agree- 
ment with that given by Laplace and Lacroix, and it is, moreover, a defini- 
tion in which probability is clearly seen as a relation between propositions**. 
But despite the importance of this book as a contribution to inductive prob- 
ability, and of the discussion of confidence, belief, and subjective probabil- 
ity to be found there, there is little that is directly relevant to our present 
theme. Indeed, the only pertinent point seems to be a brief use of the rule 
of succession in §379. Here Bolzano states that if the proposition A has 
occurred a times in n cases, the probability that A is present in a further 


case is (a + 1)/(n + 2). 
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8.4 Augustus de Morgan (1806-1871) 


‘Augustus de Morgan®°® was born in Madura, India, the year of his birth 
being the solution of a conundrum he himself proposed*®, viz. “I was x years 
of age in the year x?.” Despite a physical defect (according to MacFarlane 
[1916], “one of his eyes?’ was rudimentary and useless” [p. 19]) de Morgan 
was an indefatigable author: Peter Heath, in his 1966 edition of some of de 
Morgan’s logical works, says that his output was “probably the largest of 
any mathematician of his time” [p. ix]. Yet among this vast number®® only 
some half-a-dozen are devoted to probability in itself°%, and even in these 
there is little that seems directly relevant. 

The origin of the name “inverse probability” has been traced*” by Arne 
Fisher (1877-1944) to de Morgan’s Essay on Probabilities of 1838. However, 
in an anonymous review of Laplace’s Théorie analytique des probabilités in 
1837, a review*! generally attributed to de Morgan, we find a slightly earlier 
reference to the notion in the words that in the science of probability 


the problems which most naturally present themselves in prac- 
tice are of an inverse character, as compared with those which 
an elementary and deductive course first enables the student to 


solve. [p. 239] 


A yet earlier occurrence of the term is to be found in an outline of some 
lectures given at the Ecole Polytechnique in Paris in the late eighteenth or 
early nineteenth century. It is not certain whether these lectures were given 
by Fourier or Garnier, though they were certainly drawn up by the former. 
The outline is today to be found in Fourier’s papers in the Bibliotheque 
Nationale — see Crepel [1989c], where we find the following summary: 


Méthode inverse des probabilités. Regles 

De la probabilité des causes prise des événemens, mesure de 
cette probabilité. 

De la probabilité des événemens futurs dont les causes sont 
ignorées. 

De la probabilité des événemens prise des événemens observés. 
Remarques analytiques sur le calcul des fonctions de trés grands 
nombres. 

Des cas ot les événemens observés indiquent les causes avec 


beaucoup de vraisemblance. 
[Crepel 1989c, p. 37] 


The next work of de Morgan’s that warrants attention is his An Essay 
on Probabilities and their Application to Life Contingencies and Insurance 
Offices of 1838, a volume described by Fisher [1926, p. 16] as “the first 
[English] work of importance” after the publication of de Moivre’s The 
Doctrine of Chances, and by Heath [1966, p. ix] as a first-rate elementary 
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text-book. The book is most interesting to read, and can be well recom- 
mended. | 

In the sixteen-page preface we find a discussion of the difficulties that) 
beset early investigators in probability, among which de Morgan mentions 


the not having considered, or, at least, not having discovered, 
the method of reasoning from the happening of an event to the 
probability of one or other cause [p. vil, 


and on the same page he specifically refers to “the want of an inverse 
method”, further elaboration being given as follows: 


De Moivre, nevertheless, did not discover the inverse method. 
This was first used by the Rev. T. Bayes, in Phil. Trans. lin. 
370.; and the author, though now almost forgotten, deserves the 
most honourable remembrance from all who treat the history 
of this science. [p. vii] 


In Chapter I, entitled “On the notion of probability and its measurement; 
on the province of mathematics with regard to it, and reply to objections” , 
de Morgan speaks of the principle of the want of sufficient reason |p. 10] and 
its occurrence in some simple situations. Also in this introductory chapter 
we find the sentiment 


causes are likely or unlikely, just in the same proportion that 
it is likely or unlikely that observed events should follow from 
them. The most probable cause is that from which the observed 
event could most easily have arisen [p. 27], 


an opinion that is discussed in a later chapter*?. This form, in which the a 
priori probabilities cancel, is described by Keynes (1921, chap. XVI, §14| 
as the “uninstructed view”. | 

Asserting that probability questions may be of two different types, viz. 


1. Where we know the previous circumstances and require the 
probability of an event. 

2. Where we know the event which has happened, and require 
the probability which results therefrom to any particular set of 
circumstances under which it might have happened. 

The first I call direct, and the second inverse, questions 

[pp. 31-32], 


de Morgan devotes his second chapter to discussion of questions of the first 
type, and the third to those of the second: it is to this third chapter that 
we now turn our attention. 

At the outset de Morgan, having outlined the typical “argument from 
event to cause”, provides a precise definition of a cause as “simply a state of 
things antecedent to the happening of an event” [p. 53], and moreover limits 
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himself to cases involving merely a finite number of antecedent possible 
states. As a first illustration he considers the case of four urns, A containing 
three black balls, B containing one white and two black, C containing two 
white and one black, and D three white. Under the assumption that each 
urn has probability 1/4 of being chosen, he deduces that, after a white ball 
has been drawn, the a posterior: probabilities of A, B,C and D are 0, 1/6, 
2/6 and 3/6. As a second illustration he examines the case in which two 
urns contain different numbers of balls, and answers a question like that 
posed above. 
Next we find the following basic postulate: 


When an event has happened, and the state of things under 
which it happened must have been one out of the set A, B,C, D, 
é&c., take the different states for granted, one after the other, 
and ascertain the probability that, such state existing, the event 
which did happen would have happened. Divide the probability 
thus deduced from A by the sum of the probabilities deduced 
from all, and the result is the probability that A was the state 
which produced the event: and similarly for the rest. [pp. 55-56] 


Note the tacit assumption that the intial circumstances are equally prob- 
able. This principle is followed by some more simple examples involving 
lotteries and testimony, and de Morgan then turns his attention to a prob- 
lem in which the urns have unequal probabilities of being drawn: suppose 
that two urns contain 3 white and 4 black, and 2 white and 7 black balls 
respectively, and that the first urn is three times as likely to be drawn as 
the second. The method of solution proposed is curious: de Morgan intro- 
duces two further urns, each of the same composition as the first, which 
then results in a situation capable of being handled by the earlier principle. 
He follows this by giving (in words) the rule 


Pr (A; | E] = Pr(F | A;| Pr (Ai ] pee: [E | H;] Pr [H;], 


and illustrates it with a further simple example. 
A discussion of what is essentially the rule of succession provides a heuris- 
tic for the following principle: 


Having given an observed event A, to find the probability which 
it affords to the supposition that a coming event shall be B, find 
the probability which A gives to every possible preceding state; 
multiply each probability thus obtained by the chance which B 
would have from that state, and add the results together. 

[pp. 60-61] 


This may be written symbolically as 


PEP Ale 2 Beles | A] Pr[B | A;], 
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an expression that is true provided that A and B are conditionally inde- 
pendent given each H;. This is followed by some further lottery examples, 
following which the general result is stated that if A and B have happened 
m and n times respectively, the probability that the next event will be an 
A is (m+1)/(m+n-+2), this being based on the consideration that the an- 
tecedent probabilities of the events may be anything whatever*®. Similarly, 
the probability that a further (p + g) events will result in p occurrences of 
A and q of B is given as 


pee dit oe 
p q pt+q 
and this is illustrated once again by a lottery. 


As has already been mentioned (see §8.2) de Morgan also provided a 
multinomial generalization of this result, arguing as follows: 


suppose we have no reason, except what we gather from the 
observed event, to know that A or B must happen; that is, 
suppose C' or D, or EF, &c. might have happened: then the next 
event may be either A or B, or a new species, of which it can 
be found that the respective probabilities are proportional to 
m+1,n-+1, and 1; so that the odds remain m+1ton+1 
for A rather than B, yet it is now m+ 1 ton+2 for A against 
either B or the new event. [p. 66] 


The general method for finding the probabilities in such a case is given 
thus: 


When a number of different events have happened, A, B,C, 
&c., write down each number increased by 1, and the results 
will express the several relative probabilities, on the supposition 
that no events can happen except those which have happened. 
But if new events may happen, write down 1 for the relative 
probability of such an occurrence at the next trial. [p. 67] 


In Chapter IV, “Use of the tables at the end of this work”, we find 
the suggestion that, when the chances of A and B are known, we may well 
suppose that in a large number of trials A and B will occur in proportion to 
their respective probabilities. Several problems of a direct nature (i.e. when 
the a priori probabilities are supposed known) are solved using the tables, 
which are seen to be based on cumulative frequencies for the probability 
density function defined by | 


y= ~210-*"/*, 2 >0 (19) 


Ji 


where @ = In 10. 
Attention is next turned to inverse problems, the first considered being 


the following: 
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In a+6 trials A has happened a times and B 6 times: from which, 
if a and 6b be considerable numbers, it is safe to infer that it is a 
to b nearly for A against B. What is the presumption that the 
odds for A against B really lie between a—k tob+k anda+k 
to b—k? [p. 83] 


‘Denoting by P(A) the (unknown) probability of A, we may write the quae- 
situm as 


Pr [|P(A) — a/(a+ b)| < k/(a + d)], 


a probability that de Morgan states, though not in so many words, may be 
approximated by integration of the function y given in (19) above from 0 to 
k/\/2ab/(a+ 6). This in fact, making allowances for the different densities, 
coincides with Laplace’s inversion of Bernoulli’s Theorem (cf. §7.15.2 and 
Keynes [1921, chap. XXX]), according to which 


Pr ||P(A) - a/(a + 6)| < yV/2ab/(a+ 6)9) =f e~? dt. 


Similar problems, with different limits, are also considered. In each case 
de Morgan’s commendable intent to provide results readily accessible to 
the enquiring layman leads to the avoidance of mathematical notation as 
far as possible, resulting in the provision of “Rules” with practically no 
justification. 

In another work also published in 1838 (but read February 26, 1837) and 
entitled “On a question in the theory of probabilities” , de Morgan corrects 
“an oversight” made by both Laplace and Poisson in their discussion of 
Bernoulli’s Theorem. Both these savants deduced that, if A, denotes the 
number of times an event A occurs in n trials, then 


2 


oo (20) 


a 2 1 
PrijA, — np| < @ = ert dts: 
(| pPl< |p Pa 


where y = £,/n/2vw, with n = v + w where v and w “are proportional to 
the chances of arrival or non-arrival in a single trial” (de Morgan [1838b, 
p. 423]). De Morgan points out that both Laplace and Poisson inferred that 


(20) therefore represented the same probability when p was unknown. 
After detailed attention to the approximation of the ratio 


[ a-atae | [ ra-aae, 


the approximation following Laplace’s method, de Morgan finds that 


Pr [w — pa < P(A) <u] = = | eV dt +B (e™ —1) 
—H 
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B 2 
Pr[w — pa < P(A) <w+ypa] = — e* dt 


Prlw < P(A) <w+ypa] = se fet aes a (ie), 


where a = \/2w(1—w)/n and @ = V2(1 — 2w)/3\/nmu(1 — w). 

In an Addition to this paper de Morgan notes that (20) is unnecessarily | 
complex: if £ is replaced by (é 4 5) one finds, neglecting terms of the same 
order as those disregarded by Laplace, that 


Y 2 
Pru £< An <v+4=— | e~* dt, 
0 


where y = (£+ $) \/n/2vw. 

In his extensive article “Theory of probabilities” in the Encyclopedia 
Metropolitana** of 1843 de Morgan once again discusses “inverse principles, 
in which we reason from known events to probable causes” [§18]. Two such 
principles are given: the first runs as follows: 


Principle IV. Knowing the probability of a compound event, 
and that of one of its components, we find the probability of the 
other by dividing the first by the second. This is a mathematical 
result of the last too obvious to require further proof. [§18] 


If what is meant is Pr[A] = Pr[A A B]/ Pr[B], then some independence 
assumption is lacking. (The same assertion is given as Principle III in the 
Essay on Probabilities, again with no mention of independence**.) The 
second principle mentioned is the following: 


Principle V. When an event has happened, & may have hap- 
pened in 2 or 3 different ways, that way which is most likely 
to bring about the event is most likely to have been the cause. 


[$19] 
We also find here (essentially) the result 


Pr [Hi |B] = Pr( | Hi] /DPr(e | Hy) (21) 


illustrated by a “balls and urn” example of the usual sort. 

Some references to inverse methods may also be found in de Morgan’s 
Formal Logic of 1847. Perhaps the first item of interest here, after we have 
noted his saying “I throw away objective probability altogether” [p. 173], is 
de Morgan’s “ordinary rule” for computing a probability, given in his ninth 
chapter, “On Probability”, as follows: 


When all the things that can happen can be resolved into a num- 
ber of equally probable (or credible) cases, some favourable and 
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some unfavourable to the event under consideration, then the 
fraction which the favourable cases are of all the cases, measures 
the probability (or credibility) of the arrival of the event: and 
the fraction which the unfavourable cases are of all the cases, 
measures the probability (or credibility) of the non-arrival. 

[p. 184] 


Notice that de Morgan refers here to the fraction as a measure of the 
probability itself: a similar approach may be found in the article “De la 
probabilité” in Laplace’s Essai philosophique sur les probabilités (see Dale 
[1995] for comment), though both these authors in fact take the ratio to be 
the probability. 

After considering some problems in direct probability, de Morgan turns 
his attention to inverse questions. The first, concerned with our feelings 
when a white ball is drawn (at random) from one of two urns (that urn 
also being chosen at random), prompts the following remark: 


This inversion of circumstances, this conclusion that the circum- 
stances under which the event did happen, are most probably 
those which would have been most likely to bring about the 
event, is of the utmost evidence to our minds. [p. 188] 


This is nearly followed by what is a sort of discrete Bayes’s Theorem, viz. 


If the probability of the observed event, supposed still future, 
from the several possible precedents, severally supposed actu- 
ally to exist, be a, 6, c, &c: then, when the event is known to have 
happened, the probabilities that it happened from the several 
precedents are 

a b 


—__—_——_———— for the first, —————————— for the second, &c. 
Gp OSC ep es: at+tb+c+--- 


[p. 190] 
We may write this result, one to which Hailperin refers as “a weak form of 


the inverse probability principle” [1988, p. 158], as (21) above, under the 
assumptions of 


(1) the mutual exclusiveness of the ‘precedents’ H;; 
(ii) the equality of the ‘prior’ probabilities Pr[A;]|; and 
(ili) the implication of H; V H2V---V Hy by E. 


Now it is interesting to note that, unlike Bayes’s problem or its solution as 
given in his tenth proposition (see §§3.3 and 3.4), this result of de Morgan’s 
refers explicitly to the occurrence of a future event. It is in fact perhaps 
more in line with Laplace’s work (see §7.3) than with Bayes’s result, though 
one might well bear in mind our discussion in §4.3 on the timing of events. 
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Following close on the heels of the above result is the following: 


Again, if there be several events, which are not all that could 
have happened; and if, by a new arrangement (or by additional 
knowledge of old ones) we find that these several events are now 
made all that can happen, without alteration of their relative 
credibilities: their probabilities are found by the same rule. If 
a,b,c, &c. be the probabilities of the several events, when not 
restricted to be the only ones: then, after the restriction, the 
probability of the first is a+(a+6+---), of the second, b+ 
(a+6+---) and so on. [p. 190] 


My interpretation of this result runs as follows: consider events A;,...,An 
with Pr[A;] = p;. If, as the result of some action, we get 


di; FeAl 2 ee.c1} 
Pr[A,] = 
0 , ¢€{m+l,...,n}, 


where, for i,j € {1,2,...,m}, we must have q; : qj :: pi : pj, then 


m 
Can /Sp. 
sl 


(compare Donkin’s work, discussed here in §8.16)**. 

Like so many other 19th-century writers, de Morgan could not resist the 
lure offered to him by the subject of testimony, and several pages of his 
Formal Logic are devoted to this topic. 

De Morgan in fact differentiates between argument and testimony*’: 


There are two sources of conviction, argument and testimony, 
reason why the thing should be, statement that the thing is 


[p. 191], 


and the six problems of this chapter are concerned with these two topics, 
sometimes singly and sometimes together. We shall consider all these prob- 
lems here, though not in as much detail as in the original: much insight can 
be obtained from Hailperin [1988, §7] & [1996, §2.3]. 


Problem 1. 

There are independent testimonies to the truth of an assertion, 
of the value p, v, p, &c. (one of them being the initial testimony 
of the mind itself which is to form the judgment): required the 
value of the united testimony. [p. 195] 
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Under the restrictions that all testimonies are right, or all wrong, de Morgan 
gives his solution as 


pp... 


pvp...t—-pi-v)d—p)... ig 
(= wA=YO= Paina 


pyp...+(1—p)(1—v)(1—p)... 


(notation slightly altered). 

As Hailperin [1988, p. 160] has noted, there is room for confusion about 
de Morgan’s phrase “the value of the united testimony”, which, for two 
witnesses, becomes 


py/[wy + (1-4) —v)). (22) 
Following Hailperin (op. cit.) let us set 
T; = T(W;, A) = witness W; testifies to the truth of ‘A’ 


for i € {1,2}. Then 
w= PrlAlTi], v = PrlA|Z9] , 
and (22) becomes 


Pr[A[7Z;] Pr[A|T] 
Pr[ A/T} Pr[A|Zo] + {1 — Pr[A|Ti]}{1 — Pr[A|Z2]} 
= Pr[AT,][Pr[ AT] 
~ Pr{ATj] Pr[ AT] + Pr AT)] Pr[ AT3] 


However, if “the value of the united testimony” is taken as Pr[A|T; 79], 
then we have 


(23) 


Pr[A|T) To] = eee , 

Pr[ AT, 75] + Pr{ AT, T>] 

coincidence of which with (23) requires some assumption of independence. 
We shall not consider de Morgan’s illustrations of and thoughts on the 

above result, apart from noting that some consideration is given to the 

probability of collusion between the witnesses. 


Problem 2. 

Let there be any number of different assertions, of which one 
must be true, and only one: or of which one may be true, and 
not more than one: or of which any given number may be true, 
but not more: required the probability of any one possible case. 


[p. 200] 
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8 


One of the solutions*® is one that we might write as 


Pr[AJABC D v ABCD v ABCD v ABCD) 
_ Pr[ABC D] 
~ Pr[ABCD]+ Pr[ ABC D] + Pr[ A BCD] + Pr[ ABCD) ’ 
though once again independence is needed to pass from this result to de 
Morgan’s 
pv' pla! 
pu' pla! + p'vp'o! + p'v'po! + p'v'plo 
where yi’ = 1 — yp, etc. 
This concludes the problems concerned purely with testimony: the next 
two questions are concerned with arguments. 


Problem 3. 

Arguments being supposed logically good, and the probabilities 
of their proving their conclusions (that is, of all their premises 
being true) being called their validities, let there be a conclu- 
sion for which a number of arguments are presented, of validi- 
ties a,b,c, &c. Required the probability that the conclusion is 
proved. [p. 201] 


The first thing to note is the difference between this kind of problem and 
those discussed before; indeed, de Morgan writes 


) 


Testimonies are all true together or all false together: but one 
of the arguments may be perfectly sound, though all the rest 
be preposterous. [p. 201] 


Once again the solution proposed, viz. 


(i) Pr[all arguments fail to prove the conclusion] = (1—a)(1—b)(1—c).... ; 


(ii) Pr{not all arguments fail] = 1-— (1—a)(1—b)(1—c)..., 


relies heavily on an unstated assumption of independence??. 


Problem 4. 

A conclusion and its contradiction being produced, one or the 

other of which must be true, and arguments being produced 

on both sides, required the probability that the conclusion is 

proved, disproved (2. e. the contradiction proved), or left neither 

proved nor disproved. [p. 203] 
Letting C’ denote the conclusion and A and B the respective combined 
arguments for C and C, we have Pr[A] = a, Pr[B] = 6, and 

Sede Pr[AB] 
ean Pr[AB]+ Pr[ AB] + Pr[ AB] em 
a(1 — b) 
a(l1—6)+(1—a)b+(1-—a)(1— 8)’ 
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where once again independence of A and B is needed. (Similar results 
obtain for Pr[ AB| AB] and Pr[ A B| AB].) 

Hailperin [1988, p. 164] finds this solution unacceptable on the following 
grounds: while A — C and B — C are required by the hypotheses, the 
“given” argument AB in (24) is not the complete condition needed; for 
(A + C)(B > C) implies AB, but not conversely®°. 


Problem 5. 
Given both testimony and argument to both sides of a contra- 
diction, one side of which must be true, required the probability 
of the truth of each side. 

This is the most important of our cases, as representing all 
ordinary controversy. [pp. 204-205] 


To solve this problem, de Morgan firstly collects all the testimonies to- 
gether, denoting their combined force for the first side by yu (and the force 
for the other side by 1 — yz therefore). The probabilities that the first and 
second sides are proved (to be true) by one or more of the arguments are 
taken to be a and 6 respectively. The probabilities of the two sides are then 
in the proportion u(1 — 6) : (1 — »)(1 — a), because 


for the truth of either side, it is not essential that the argument 
for it should be valid, but only that the argument against it 
should be invalid. [p. 205] 


It thus follows that 


(1 — 6) 


Pr[lst side] = bys = a2) 


aa (1 = n)(1 = 4) 
; —p)(1l—-a 
Pr[2nd side] = ————_--___—_. . 
nO enya ANG a) 
Again Hailperin (loc. cit.) finds de Morgan’s solution to be inadequate”! , 
since only part of the condition (A — C)(B — C) is considered. 
De Morgan’s final problem, with its solution, runs as follows??: 


Problem 6. 

Given an assertion, A, which has the probability a; what does 
that probability become, when it is made known that there is 
the probability m that B is a necessary consequence of A, B 
having the probability 6? And what does the probability of B 
then become? 

First, let A and B not be inconsistent. The cases are now as 
follows, with respect to A. Either A is true, and it is not true 
that both the connexion exists and B is false: or A is false. This 
is much too concise a statement for the beginner, except when 
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it is supposed left to him to verify it by collecting all the cases. 
The odds for the truth of A, either as above or by the collection, 
are a{1—m(1-—6)} to1l—a. As to B, either B is true, or B is 
false and it is not true that A and the connexion are both true. 
Accordingly, the odds for B are as b to (1—6)(1— may). [p. 209] 


On this solution Hailperin [1988] comments 


We are hard put to make this passage cogent, even laying aside 
De Morgan’s neglect of consistency requirements regarding the 
values a,b and m. [p. 166] 


I share his views: nevertheless let me attempt to show how de Morgan’s 
solutions might be arrived at. 
Notice firstly that for any suitable entities p,q and r, 


pPN>APATAr) =pAQV?). 
Considering firstly the case of B, notice that, if B is true, then 
Pr{B] =6. 


If, on the other hand, “JB is false and it is not true that A and the connexion 
are both true”, then, firstly, 


Pr[ B])=1-64, (25) 
and secondly, from 
“[pA(p>qQ] = ~lpA@V4)| 
= (pq) 


it follows that 
Pri-(A A(A— B))} = PrlA(AAB)] 


1—Pr[AA B]. 
For brevity, let @, denote the statement 

it is not true that A and the connexion are both true. 
Then, assuming some necessary independence, we have 


Pr[B false & Qi] = Pr[B A7A(AA(A— B))| 


Pr[ B] Pr[-(A A (A > B))] 


II 


(1 — 6)(1— Pr[AA B}). (26) 
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Now de Morgan claims that the odds in favour of B are as 
b:(1—8)(1 — ma) . 
It thus follows from (25) and (26) that 
1—ma=1-—Pr{AAB], 


whence we get 
m = PriAnB)/ PrlAl 


Pr[B|A] . 


Thus it seems that de Morgan is interpreting our Pr[(A — B] as Pr[B{A]. 
The case of A is more complicated. Firstly, the term “either A is false” 
gives 


Prf AJ=1l—a. (27) 
Let Qo denote the statement 
it is not true that both the connexion exists and B is false. 


That 
Pr{A] = a 


when A is true, is evident. Next, notice that 
pAAQAT) & pAGVT) 
pA>(pAqQA?) 
pA7(qApVr) 
pA~(qA (p>7)) 
pA@GV(p>r)) 


PAV @Vr)) 


¢ ¢ ¢ & g 8 


PA(QVPVr) 
= pA(gApATr). 


Thus, again with an appropriate assumption of independence, we have, 
with C' denoting “the connexion exists” , 


Prll[AAQ2] = Pr[AA-(CAB)| 


PrL[AA(CAAAB)]. 


328 8 Poisson to Whitworth 


Recalling that _ 
PrlEF' | = Pr{£] — Prl EF] , 


we note that Pr[A A Q2] may be rewritten as 
Pr[AAQ2] = Pr{A]—PriCAAAB] 


Pr[A] — Pr[C] Pr[A] Pr[ B] 


lI 


a{l1—m(1—5)}. (28) 
From (27) and (28) it follows that the odds in favour of A are as 
a{l1—m(1—5)}:l—-a. 


This, in some measure, shows how de Morgan might have arrived at 
his results: the careful reader will note the lacunae in and assumptions 
missing from the argument. The most charitable thing one can say about 
the attempted reconstruction is that it is not implausible, and one must, 
like Hailperin [1988], conclude that 


we find it unprofitable to continue with the remainder of [de 
Morgan’s] solution of Problem 6. [p. 167] 


This concludes our discussion of the work of this august personage. 


8.5 Irenée Jules Bienaymé (1796-1878) 


In a memoir®® published in 1838 and devoted to a direct proof of a re- 
sult by Laplace on the probability of the mean of observations, Bienaymé 
opens with some historical remarks. Having mentioned Bernoulli’s and de 
Moivre’s results, he refers to the inverse problem in the following words: 


La solution ne fut donnée que soixante ans plus tard, par Bayes, 
savant anglais peu connu, sans doute parce qu’une mort trop 
prompte interrompit ses travaux, mais qui parait avoir possédé 
a un trés-haut degré les qualités du géometre. [p. 514] 


He then cites a numerical example from Bernoulli, and states (wrongly) 
that Bayes had discussed an inverse to this particular numerical result. 
The fundamental result used in this memoir is the following: 


si un événement a été observé p fois sur un grand nombre p+ p, 
d’épreuves, la probabilité que la possibilité de cet événement est 
comprise dans les limites 


P 2PpP1 
1 —— +c, | —— 
(1) p+p1 y (p + pi)? 
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est égale a lintégrale définie 


2 cig 
2 = | e' dt 
Oe Te, 
[pp. 517-518]. 


Bienaymé next supposes that y,71 are two arbitrary functions of the 
observed events, relative respectively to the event A, which has happened 
p times, and to the event B, which has happened p, times. If 2, x; denote 
“es possibilités inconnues de ces deux événements” [p. 518], we wish to 
determine the value of the quantity 


v= yet 721. 


Taking for this quantity the mean of the products of y, 7, multiphed re- 
spectively by p, pi, he suggests that one try to find the probability that 


vo = (yp + Pi) /(p+p1) 


does not differ from the true value v by any given amount; and he notes 
further that this question reduces to that given in the preceding quotation 
when y= 1,71 = 0. 

He then considers, preserving the previous notation, the finding of the 
probability that v = yx+y12, lies between two values a’ and a. Noting that 
(for given x) the probability that A and B occur p and p, times respectively 


? zi - zP?(1 — 2)P2, 
P 


is 
Bienaymé suggests that one should find z and 1 — x from the expression 
for v, whence 


r=(v-n)My—n) , (l-2)=(7¥-0)/(y—n). 


Then, “dans l’hypothése d’une valeur assignée 4 v” [p. 521], the compound 
event has probability 


“se (: =n)! ( eae ) 
p TV NES ~ 
Thus the probability “de Vhypothese d’une valeur de v” [p. 521] will be 


(w= aa )Py = oped / [o-myra=oae, 


where 7, < ¥ (the limits of integration are given in the reverse order in the 
original). Finally, the probability that the true value of v lies between a’ 
and a is found by integrating this last expression between the given limits. 

Attention is next paid to the case of three events, a procedure analogous 
to that detailed above being followed. At the conclusion of the exercise the 
curious statement is made that 
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l’on est des lors complétement certain que la solution est indé- 
pendante de la loi de probabilité des divers événements simples. 
[p. 529] 


An extension is also made to n events. | 

In 1840, in a paper published as an Extrait des procés-verbaux of the 
Socrété Philomatique de Paris under the title “Sur la constance des causes, 
conclue des effets observés” °4, Bienaymé proposed 


un principe de probabilités qu’il croit. entiérement nouveau, et 
qui lui parait susceptible de recevoir des applications contin- 
uelles dans les sciences d’observation. [p. 1] 


This new principle runs as follows: suppose that a number of experiments 
have been carried out, resulting in a mass of statistical data from which 
a certain mean result is to be determined. These data may be divided, in 
almost any natural way, into two or more groups, for each of which a mean 
result may be found. These means in turn will differ more or less from each 
other and from the mean of the original data set. Then, says the writer of 
the extract, speaking of the extent of the differences, 


il semble au premier coup-d’ceil qu’elle devrait également dépen- 
dre de la possibilité que donne aux phénomeénes en question 
la cause ou le systéme de causes qui les régit. Cependant il 
n’en est rien, quand ce systeme de causes reste constant pen- 
dant toute la durée des expériences. On démontre sans peine 
que, dans ce cas, les relations de probabilité qui doivent exister 
entre le résultat général et les résultats partiels sont absolu- 
ment indépendantes de la possibilité des phénomeénes; il n’entre 
dans les expressions qui les caractérisent que les résultats seuls 
des observations faites, méme alors que la loi de possibilité des 
phénomenes est connue 4 l’avance. [p. 2-3] 


The principle is proved by considering the drawing of a large number c 
of balls from an urn containing white and black balls in a known ratio, the 
possibility of the drawing of a white ball being p. This draw results in a 
white and 6 black balls. Next, suppose that the c draws are divided into 
two series of m and n balls respectively. The probabilities of getting r and 
q white balls in the two (sub-)series are 


(™)era- pyr and (") pa =» 


respectively, the probability of r and q white balls in the combined series 


thus being 
m n m+n—-T— 
(™) (Jere - pyres 
i q 


Recalling that c= m+n and that r+q =a, we see that 
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il faut en faire la somme, et diviser l’expression précédente par 
cette somme. [pp. 3-4] 


This leads to ("") )/ (): 


arid we note that the terms involving p disappear’”. 
This latter expression, it is further noted, may be written as 


() (nee) / (a) 


la possibilité de tirer r boules blanches et (m — r) noires d’une 
urne contenant c boules, dont a blanches et 6 noires, quand on 
y prend m boules au hazard, sans en remettre aucune. [p. 4] 


which is recognized as 


It is then proposed to apply this principle to the question of whether a 
cause (or system of causes) varies over a series of experiments, and to do 
this 


il suffira de les diviser en séries partielles, et de calculer si les 
écarts des résultats moyens de ces subdivisions sont renfermés 
dans les limites que leur assigne le résultat moyen général. 


[p. 5] 


The formula appropriate to the case in which the series of trials is divided 
into two, in each of which trials either of only two outcomes is possible, is 
then discussed. Before passing to consideration of this formula, however, 
let us recall that, if X is a random variable having the hypergeometric 
distribution with 


neces ()(4,)/C29. 


where each von Ettingshausen symbol ( has u > I, then X has mean and 
variance respectively given by 


E(X) = pr/(p+q) 


V(X) = par(p+q—r)/(p+q)*(pt+9-1). 


Making the transliteration to Bienaymé’s notation, we have 


red VC) /G) 


E(X) = am/c 


V(X) = abm(c — m)/c?(c — 1). 
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For future use let us define V,(-) by 
V(X) = abm(c — m)/c? , 


so that®® 

c 
—] 
Bienyamé’s problem and its solution run in full as follows: 


V(X) = —— V(X) | 


on supposera qu’il a été observé a phénoménes d’un certain 
genre sur un grand nombre c d’expériences, et que le phénomeéne 
contraire a par suite eu lieu c—a = 8b fois. Si l’on prend une 
série partielle de m de ces observations, on doit trouver, dans 
Vhypothese d’une cause constante, que le nombre des phénomeénes 
dont il s’est. présenté a sur la masse, est, pour la série partielle, 
compris entre les limites 


r=zN+tu Ares ea 
V C2 Cc 


1 
(x étant le plus grand nombre entier renfermé dans (m+ 1) : + >) 
c 


avec une probabilité, exprimée par 


lf ; 
as df idte™t 
al =. 


[pp. 5-6] 
Denoting by X “le nombre des phénomenes”, we may write the desider- 
atum as 


Pr N ~u/f2V,(X) <X<N4 u/2V5(X) | 
= Pr -uv2 < (X — N)/.J/V,(X) < uv? | | 


the integral approximation given by Bienyamé presumably being obtained 
by a version of the Central Limit Theorem. Now a rigorous application of 
this latter result would require consideration of 


Pr nw /8 < ALL) a) 
V(X/m) 
where 
E(X/m) = afc 
V(X/m) = ab(e—m)/mc?(c— 1) 
ee ay 


m2 c—1 
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Note too that 
am (a+1)(m+1) — am 


—— +] 
C c+2 ope 


so that when £(X) = am/c is an integer, it coincides with Bienyameé’s NV. 
Bienyamé’s integral approximation may be written as 


= | et atte | /2nV,(X) 


a form in which the use of V,(X) for the correct form of the variance is 
transparent. 

It is also noted that a formula derived by Laplace in another context 
may be used here, that formula being 


celle qui exprime les écarts probables d’un nombre m de nou- 
velles épreuves, quand déja on a fait c expériences qui ont donné 
a fois le phénomeéne attendu. [p. 6] 


Denoting by R the number of occurrences of the phenomenon in the m new 
trials, we find that 


Pr hy ~u/2V,(R)<R<N+4 u/2V, (R) 


ae oe 
~ <p fe dt+e" /V2nV,(R) 


where V, (R) = abm(c+m)/c? and N is the integral part of (m+ 1)a/c .°” 
The differences between V, V, and V, are in fact unimportant in view of 
the asymptotic nature of the integral approximation. 


8.6 Mikhail Vasil’evich Ostrogradskii 
(1801-1861)* 


In an extract from his memoir “Sur la probabilité des erreurs des tribunaux” 
[1838], a paper read on the 12th June 1834, Ostrogradskii®® is reported as 
having considered the case of a tribunal in which the different veracities of 
the judges he within known limits. The probability of an error being made 
by such a tribunal consisting of a given number of judges is determined, 
and the limits of the veracities after a decision has been reached are found. 
The memoir is concluded with a discussion of the case of equal veracities, 
a matter considered earlier by Laplace and Condorcet, and in view of the 
attention given to these authors in the present work, it seems not inadvis- 
able to consider this memoir here. 
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Under the hypothesis that the veracities of the judges all lie within the 
same limits Ostrogradskii finds that the probability of “l’erreur a crain- 
dre” depends only on the majority, that is, on the difference between the 
numbers of judges of opposing views. Laplace and Condorcet had thought 
that such a result was contrary to common sense, but es 
doubt conscious of the danger of slavish submission to authority in science, 
maintains that there is nothing in his analysis to warrant such an opinion. 

He cites the example given by Laplace in his Essaz philosophique sur 
les probabilités on the difference between the probability of an error in a 
judgment unanimously rendered by twelve judges and the probability of 
error in a judgment given by a majority of twelve votes in a tribunal of two 
hundred and twelve judges, and follows this with the following example: 


Pour avoir moins a discuter, comparons un seul juge se pronon- 
cant affrmativement dans une question, a un tribunal A de trois 
juges, dont deux se prononcent affirmativement, le troisieme 
négativement. Sans rien changer a la question, on peut rem- 
placer le seul juge par un tribunal B de trois juges, dont un 
affirme, et les opinions des deux autres sont inconnues. Nous 
pourrons, relativement au tribunal B, faire trois hypotheses 
suivantes: 


1° Les deux juges a opinions inconnues sont de méme avis 
que le premier. 


2° L’un des deux partage l’opinion du premier, et l’autre ne 
la partage pas. 


3° Tous les deux contredisant le premier. 


La seconde hypothese est exactement dans le cas du tribunal A, 
la premiere est a l’avantage du tribunal B, ou ce qui revient au 
méme, a l’avantage d’un seul juge, et la derniere, au contraire, 
est a Vavantage du tribunal A; or, je ne vois pas pourquoi la 
premiere hypothese augmenterait la probabilité d’un seul juge, 
plus que la derniére ne l’affaiblit. [p. xx] 


Having compared his example with Laplace’s, Ostrogradskii asks 


D’ou vient la grande différence dans la confiance que nous ac- 
cordons au méme nombre de juges, également véridiques, et 
dans la méme situation relativement a nous? Cette différence, 
il n’y en a point; nous sommes induits en erreur, faute d’avoir 
suffisamment approfondi la matieére, [p. xx] 


and he concludes this discussion by noting that 


S’il est vrai qu’on est porté a considérer comme nulle la décision 
d’un nombreux tribunal, rendu a une trés faible majorité, et 
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qu’au contraire, on donne un grand poids a une décision unanime 
du tribunal composé d’un petit nombre de juges, Je crois que 
ce qui nous y porte est plutot un préjugé, que le bon sens et la 
considération exacte de la matiére. [p. xxi] 


Ostrogradskii notes further that what he has just said does not determine 
whether his method or that proposed by Laplace and Condorcet is correct; 
the matter can however be decided by the consideration of an objection 
he raises to the work of “ces géométres célébres”, an objection resting not 
on the probabilistic principles employed but rather on the manner of their 
employment. Ostrogradskii’s point is that while the veracities of the judges 
may have the same limits, they should run independently of each other 
from the lower to the upper limit (which will necessitate the consideration 
of as many integrals as there are judges, rather than the single integral 
used before). As a simple example he supposes that the three veracities 
1/2, 3/4 and 1 are possible: whereas the method advanced by Laplace and 
Condorcet would require our supposing that all n judges have the same ve- 
racity, Ostrogradskii’s method would allow any one of the 3” combinations. 

The writer of the Extract now cites various formulae given by Ostrograd- 
skii, formulae that lead to the following result: suppose that, of a tribunal 
of m + n judges, m vote for the conviction and n for the acquittal of an 
accused. Let 21,...,£m be the veracities of those who vote for conviction, 
and y1,.--,Yn the veracities of those who vote for acquittal. Then 


Pr[the accused is innocent|he is convicted] 


-Tlo-=o TT //\Ee 0 Ts +1 To») (29) 


Since the z; and y; may each take on an infinity of different values, how- 
ever, this expression must be multiplied by the probability “de l’existence 
simultanée des véracités £1, 22,...,2%m, Y1; Y2;)--+,)Yn” [p- xxii] 
fia - 2) fw + Hectla—»)] axay 
1 1 1 1 (30) 
m n m n : 
[/ [fie - softy + fetta -)] axay 


where dx= dz,...dz,, and dy= dy, ...dyn. 
It is a little difficult to follow the report here: presumably the argument 
runs (not too precisely) as follows: 


Pr{error] = [| Peterson A X=x, Y=y] dy dx , (31) 


336 8 Poisson to Whitworth 


while the integrand may be written as 
Prierror \ X=x, Y=y] = Prlerror|K=x, Y=y] Pr[X=x, Y=y] 


= Prfaccused is innocent|he is convicted] Pr[X=x, Y=y] . 


The right-hand side of this last expression is then the product of (29) and 
(30), or 


m n 
ila — z;) I] y; dx dy 
m n m n : 
[] |fie - 20 tte: + Hectia -»)] aay 
Substitution of this in (31) and integration over 2; € [z;, 27], yj € [yj, ¥/] 


for all appropriate 2 and j show that the probability that the decision of 
the tribunal is wrong is 


It is then noted that “I] est remarquable que Ja probabilité précédente 
ne dépende que des sommes des valeurs extrémes des véracités” [p. xxii], 
so that the probability of an error remains the same if all the quantities 
x; + 2%, y; + yj; are the same. Denoting this common value by z one may 
eae the probability of an error as 


14 3) ) | (32) 


a result that also obtains when the limits of the veracities are the same, 
and that clearly shows the dependence of the probability of an error on 
the difference m — n — that is, on the majority of judges in favour of a 
conviction. It is also noted that, for z = 1, the above probability becomes 
1/2, while 


La méme fraction 1/2 représentera aussi la probabilité de la 
validité du jugement; ainsi dans le cas ot la somme des lim- 
ites des véracités de chaque juge est égale a l’unité, on est 
dans une indécision complete sur la valeur d’une décision; il 
reviendrait au méme de remettre au hasard le sort du prévenu, 
pourvu qu’on égalise les chances pour la condamnation et pour 
absolution. La décision d’un tribunal n’acquerra une valeur 
que dans le cas ot la somme des véracités extrémes dépasse 
Punité, et plus cette somme s’approche de la limite supérieure 
2, plus on doit espérer de ne voir que des décisions conformes a 
la vérité. [p. xxii] | 
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If the limits of the veracities are set at 1/2 and 1, then z = 3/2 and (82) 
becomes 
(1 ty ae 


whereas Laplace’s formula would give the probability of an error by the 
tribunal as 


1/2 1 
hyp(mt+i,nt+1je [era -apde [ fama ayes 
0 0 


Comparison of these two formulae may be effected by the recollection of 
the relations 


I,(a,6) =1—I1,(b, a) 


and 


— I,(a,b) = ef,(a — 1,6) + (1 — 2) Ie (a,b — 1). 
On setting « = 1/2 one gets 


Ly ;2(m,m) = 1/2 ’ 


and, for k > 0, 
i; eo 
Lyo(nt+kn+1) = opt Do peep apaln tim) 
j=2 
Lijo(n+1,n+k) = L—ILijo(n+k,n+1). 
Thus 
< L1/2(m, n), m—n> QO, 
(1+3"%-")"* ¢ = 1/2, m—n=0, 


> L1/2(m, n), m—-n< 0. 


Suppose next that it is not known which m of the m+n judges vote for a 
conviction and which n vote for an acquittal, and let 4,,...,2m4, denote 
the veracities of the judges, with z; € (x, 24]. Further, let V denote what 
is in fact the probability generating function, i.e. 


m+n 


Ve [] f+ - 28)y). 


1 


Then P, the coefficient of y” in the expansion of V, will be the probability 
that the tribunal is divided into two groups of sizes m and n, with veracity 
being on the side of the n judges. Denoting by Q the similar coefficient of 
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y”, a coefficient indicating that veracity is on the side of the m judges, we 
find that the probability of an error on the part of the m judges under the 
hypothesis that each veracity has a single value is P/(P+Q). Note that V 


can be written as 
mtn 


VS Da 
k=0 


where each ¢; is a sum of products, each of these products containing 
exactly k terms of the form (1 — z;). [For example, if m+n = 3, 


Co = 21 %2%3 

C) = %1%2(1 —23)4+ £,(1 — £2)a3 4+ (1 — 21) xor3 

Cg = «a,(1—22)(1—23)+ (1 —2))re(1 — #3) + (1 — 21)(1 — 2a)ax3 
C3 = (1—21)(1—22)(1—- 2s) |] 


Next, since veracities may be viewed as continuous real variables, the 
probability that they are exactly 21,...,%m4n 18 


(P+Q)ax | [(P+Q)ax, 


and the probability of an error on'the part of the m judges will be 


[eax] feraex. (33) 


On setting y = e’” one finds that 


ae i Verime dy 
20 
1 / oi 

Q = — | Ve" dz. 
20 


Substitution of these expressions in (33) and appropriate integration show 
that the probability of the validity of the judgment depends once again 
only on the sums z/ + 2’, and if these sums are all the same (= z, say) one 
finds that the probability of an error is, as before, 


(GS) 


In a memoir read on the 23rd October 1846, entitled “Sur une question 
des probabilités”, Ostrogradskii considers the following problem (one that 
he in fact relates to quality control): 


—1 
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Un vase renferme des billes blanches et noires dont on connait 
le nombre total, mais on ignore ce qu’il y a de chaque couleur. 
On en retire un certain nombre et, apres avoir compté parm) 
celles-ci les blanches et les noires et les avoir remis dans le vase, 
on demande la probabilité que le total des billes blanches ne 
s’écartera pas des limites qu’on voudra assigner. Ou plutot, on 
demande la relation entre la probabilité et des limites dont il 
s’agit. [p. 321] 


To make things more accessible to his readers, Ostrogradskii proposes first 
to consider the following question: 


On est certain qu’un vase renferme un nombre donné de billes 
blanches et noires sans mélange d’aucune autre couleur. On 
ignore absolument la proportion des deux couleurs. ... On est 
également certain qu’on retirera du vase, ou qu’on en ait déja 
retiré, un nombre donnée de billes que nous désignerons par l. 
On demande la probabilité que dans ce nombre 7? il y aura n 
billes blanches et m noires. [p. 323] 


Ostrogradskii approaches the solution of his problem by considering a num- 
ber of simpler problems. For instance, in §4 we find the following question: 


Supposons maintenant qu’on ait retiré du vase / billes, qu’on 
ait trouvé, dans ce nombre, n blanches et m noires, et qu’on 
demande la probabilité que, dans s — | billes non sorties, il se 
trouve x blanches et y noires. [p. 326] 


The solution to the problem, he further notes, in based on the following 
known principle: 


La probabilité d’une hypotheése est égale a la probabilité du fait, 
tirée de cette hypothése supposée certaine, divisée par la somme 
des probabilités semblables relatives a toutes les hypotheses, 


[p. 327] 


and it is also noted that this form of the principle applies in the case in 
which “a priori toutes les hypothéses sont également admissibles” (loc. cit.). 

Assuming that all possible constitutions of the sample are equally prob- 
able, the chance of getting n white and (€—n) black balls will be 1/(€+ 1). 
Suppose that the urn contains s balls, x of which are white. Denoting by 
U(a, @) [respectively S(a, @)] the event that the urn [the sample] contains 
a white and @ black balls, we have®° 


Pr[S(n,£—n) | U(z,s—2)| = (*) co i | 
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Thus, again under an assumption of equiprobability, 


Pr[U(z,s—2) | S(n,£—n)] 


lI 


Pr [S(n,@—n) | U(x2,s—2)] Pr[U(z,s — x)]} /Pr [S(n, 2—n)] 
n)(e—n 1 
? eee hes 


— s+] 
Cn) (ean) / eta). 
It thus follows that the probability that the number of white balls (say W) 
in the urn does not exceed some specified q, but is greater than or equal to 


some specified p, when there are n white and m= £—n black balls in the 
sample, is given by 


Prip sw <alsin.e—m) => (2)(577) /(¢42). 


J=P 


l| 


l| 


Several pages are devoted to computational formulae (with some specific 
numerical examples), among which the following (which can be proved 
“aussie facilement que le binome de Newton” [p. 332]) may well be worth 
recording: let [z]* = x(x —1)...(a —k +1) for (non-negative?) integral k. 
Then 


[ata]" = ae | 


= he 

The third paper by Ostrogradskii to warrant our attention, “Sur la prob- 
abilité des hypotheses d’aprés les évenements” was read on the 18th March 
1859. Here the method for finding “la probabilité des événements futurs 
d’aprés les événements passés” [p. 516] is attributed to Bayes and Price, 
with Laplace being regarded as the first to avail himself of this method. 
The above quotation seems to us in fact to be more in keeping with Price’s 
Appendix to Bayes’s Essay than with the latter, though the application 
made, as the following quotation shows, seems to have more to do with 
Bayes’s Theorem itself: 


On attend un événement, qui pourtant pourrait n’avoir pas 
lieu, son arrivée est. expliquable par n différentes hypothéses 
hi, h2z,h3,...,hn. Ces hypothéses sont les seules possibles et 
elles s’excluent mutuellement, c’est-a-dire qu’il serait contra- 
dictoire d’en admettre simultanément deux ou un plus grand 
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nombre. Une certaine chance différente de zéro est attachée a 
Vexistence de chacune d’elles, et aussi chaque hypothése donne 
des chances a |’arrivée de |’évenement, mais parmi les nombres 
de ces derniéres, nombres propres a chaque hypotheése, peuvent 
se trouver qui sont égaux a zéro. L’évéenement attendu arrive, 


pour lors une des hypothéses hy, ha, hg,..., hn aeu lieu, trouver 
la probabilité, que ce soit une hypothése indiquée a volonté h,. 
[p. 516] 


Ostrogradskii derives here what is essentially the discrete Bayes’s The- 
orem, giving the probability of the hypothesis h; after an event £ has 
occurred as 


where S denotes “le nombre des chances qui existent avant l’arrivée de 

l’évenement et dont chacune amene une des hypotheses hj, ho, h3,..., hn” 

[p. 517], s; denotes the number of chances leading to h;, and f; is the number 

of the s; chances that are at the same time favourable to the event. 
Laplace’s basic principle, viz. 


Si un évenement peut étre produit par un nombre n de causes 
différentes, les probabilités de |’existence de ces causes prises de 
V’évenement, sont entre elles comme les probabilités de |’événe- 
ment prises de ces causes, et la probabilité de l’existence de 
chacune d’elles, est égale a la probabilité de |’évenement prise 
de cette cause divisée par la somme de toutes les probabilités 
de |’événement prises de chacune de ces causes, [p. 520] 


is then quoted, and criticized on the grounds of its concern only with the 
case in which the hypotheses are equally probable a prior. Ostrogradskii 
notes too that in the Théorte analytique des probabilités 


il [i.e. Laplace] y admet l’égalité entre le produit de la proba- 
bilité de ’événement a priori, par celle d’une hypothése d’apres 
Vévenement, et le produit de la probabilité de la méme hy- 
pothese a priori par celle de |’évenement d’apres l’hypothése. 


[p. 521] 


He proves this (no proof was given by Laplace) by noting that the proba- 
bility of the event a priori is F/S (where F = )° f;) while that of h; given 
the event is f;/F', the product being f;/S. Similarly, 


Sofi Si 

Pr{h;] PrfEZ|A,] = —-—= = 

(Al PoE] = 2-4 = 2 

que celui qui précéde. Mais il ne s’agissait pas de vérifier le 
principe par la valeur obtenue pour I|’inconnue, il fallait au con- 
traire se servir du principe pour la détermination de |’inconnue. 
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Au surplus il se peut que le principe en question était pour 
Laplace d’une entiere évidence et n’exigeait aucune demonstra- 
tion, quant a nous, nous avouons qu’il ne nous parait avoir ce 
degré d’évidence. [pp. 521-522] 


He also notes that Poisson had considered this question in his Récherches 
sur la probabilité des jugements. 

Gnedenko is quoted by Maistrov [1974] as evaluating Ostrogradskii’s 
contributions to probability as follows: 


In spite of the fact that in his definition of probability Ostro- 
gradskii committed methodological errors, slipping towards a 
philosophy of subjectivism, the general direction of his creative 
work in probability theory should be evaluated as instinctively 
materialistic. [p. 187.] 


8.7 Thomas Galloway (1796-1851) 


Writing in the seventh edition of the Encyclopedia Britannica of 1839, 
Galloway comments®! on the noteworthiness of Bayes’s two papers in the 
Philosophical Transactions for 1763 and 1764. In Section V, entitled “Of 
the probability of future events deduced from experience”, he gives the 
expressions 


wi = MPQAP; 


m2)" dw | fa Pia ies 


and from the latter deduces the rule of succession. There is no mention of 
Bayes or Price here. 7 


| 


Ww 


8.8 Eugéne Charles Catalan (1814-1894) 


In five papers, spread over some forty years, Catalan considered essentially 
the same problem, one that we may loosely phrase as follows: how does 
the probability of drawing a white ball from an urn change under various 
modifications of the contents? 

The first paper, published in 1841 in the Journal de Liouville, is entitled 
“Deux problémes de probabilités”; and the first of these problems runs as 
follows®?: 

Une urne A contient b boules blanches et n boules noires. On en 


extrait, par hasard, m boules que |’on place, sans les connaitre, 
dans une seconde urne B, laquelle renferme alors m boules, 
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blanches et noires, en proportion inconnue. On tire de cette 
urne, successivement, p boules; et il arrive que toutes sont blan- 
ches. Quelle est la probabilité que, faisant un tirage de plus, on 
obtiendra encore une boule blanche? [p. 75] 


Three different cases present themselves, depending on whether m is less 
than, between or greater than 6 and n. Being persuaded that all cases lead 
to the same result®*, Catalan proposes to consider in detail only the first; 
his argument runs as follows: 

Before the extraction of the p white balls, urn B’s composition may be 
given by any one of the m — p+ 1 hypotheses 


Am-p-iti1 : p+ (¢— 1) white and (m — p —7 + 1) black , 


LE {1, 2,...,m—p+1}. Now the probability that B has (m — 2) white 
balls is proportional to the probability of withdrawing from A, (m — 1) 
white balls in m draws. Moreover, asserts Catalan, 


Elle est proportionnelle aussi a la probabilité d’extraire p boules 
blanches d’une urne qui en contiendrait m — i blanches et 2 
noires. [p. 76] 


Letting b+n= s, we find that 


m 


") mails [(9)n 


Pr [H;] = ( : 
where (4), = z(z —1)...(2 —k +1), while 
Pr [p white balls drawn | H;| = (m — 1) /(m)p 

Hence 

Pr [H; | p white balls drawn] « oo (b)m-—i(n)i(m — 2)p /(S8)m(M)p = 

1 
ox [(m — p)!(b)p /(S)m] [(0 — P)m-p-i(n)i/(m — p— a)! il). 

Denoting the last square-bracketed term by A;, one finds that 


, = Pr[H; | p white balls drawn] = ey A; 


Recognizing that the denominator in this last expression is the coefficient 
of ub-™+"y™—P in the expansion of (u + v)’t"~-?, Catalan deduces that 


wi = (b= p)n—p-i(n):(m — p)! /(m—p— a)! i (8 —p)m—p - 
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We now pass on to consider the further drawing of a white ball from B. 
The probability of this event, if H; obtains, being 


wi(m—p—t)/(m—p), 
it follows that 


m—p 
P = Pr[1 white | p white drawn] = S~ w;(m—p—i)/(m—p), 
4220 


an expression that simplifies to 
P=(b—p)/(s—p). 


The independence of this result of the number of balls in B should be noted, 
and Catalan notes further that the introduction of the urn B is an unnec- 
essary affectation: one might as well suppose an appropriate partitioning 
of the balls in the initial urn A. 

The second problem is a natural development of the first: 


Une urne contient 6 boules blanches et n boules noires; une 
autre urne renferme b’ boules blanches et n’ boules noires. On 
tire au hasard p boules de la premiére urne, et p’ boules de 
la seconde; et |’on réunit ces p + p’ boules dans une troisieme 
urne. Quelle est la probabilité d’extraire de celle-ci une boule 
blanche? [p. 79] 


The desired probability P is given, by the previous result, as 
P = bp/s(p+ p') + b'p'/s'(p+ p’) , 


and the extension to m such urns is also given. 
In 1877 Catalan published a paper entitled “Un nouveau principe de 
probabilités”, in which he announced the following result®*: 


La probabilité d’un événement futur ne change pas lorsque les 
causes dont il dépend subissent des modifications inconnues. 
[p. 463] 


Although there is a glimmering of this result in some of Poisson’s work (not 
cited here), the principle merits the qualification “nouveau” in as much as 
this appears to be the first proof. 

The proof presented by Catalan is somewhat curious: suppose that an 
urn A contains 6 white balls and n—b balls of other shades. If p (which may 
be known or unknown) balls are drawn from A and, unobserved, placed in 
an urn B,®° 


les probabilités d’extraire une boule blanche, soit de cette urne 
B, soit de Vurne A, dont la composition a été modifiée, sont 
égales a b. [p. 465] 
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On the other hand, one may leave the urn A in its original state, and 
consider the white ball drawn as coming from an isolated group of p balls 
within A. This group replaces urn B, while the remaining (n — p) balls 
correspond to the modified urn A. “Le théoreme est donc démontré” 
[p. 465]. 

As an application of his theorem Catalan considers in the third section 
of this paper the following problem: 


Une urne A contient 4 boules blanches et 3 boules noires. On 
en tire, sans les compter ni les regarder, un certain nombre de 
boules. Quelle est la probabilité d’extraire une boule blanche, 
de l’urne A modifiée? [p. 467] 


The eighteen possible withdrawals are enumerated (the cases in which no 

balls are drawn or all seven are drawn are omitted). Jongmans and Seneta 

[1994] have pointed out that the drawing “four white and one black” is 

mistakenly given in Catalan’s table as “four white and three black” ®®; but 

since this case (in either form) contributes nought to the calculation of the 

required probability, the slip is of no consequence in the final analysis. 
After stating his problem Catalan writes 


On peut faire diz-huit hypotheses sur le nombre et la nature des 
boules tirées. Chacune donne lieu a une probabilité de l’événe- 
ment supposé; d’ou |’on déduit, par le théoreme de Bayes, la 
probabilité de cette hypothése, etc. [p. 467] 


Yet in the short calculation following the table listing these eighteen hy- 
potheses (the nineteenth hypothesis, under which all balls are drawn, is 
omitted “parce que la probabilité correspondante serait nulle” (loc. cit.)), 
only the Theorem of Total Probability is used. In an attempt to see where, 
if at all, Bayes’s Theorem (in the discrete form) is used, we shall consider 
Catalan’s argument in some detail. 

Let # denote the drawing of a white ball from the modified urn; let H;, 
2€ {1,2,...,6}, denote the event that 7 balls are chosen from the urn A 
on the first draw, and let ;W; denote the drawing of j white balls when 7 
are chosen — thus, for example, ,W2 will denote the initial drawing from 
A of two balls, (exactly) one of which is white. The possible results, and 
their probabilities, may be summarized in the adaptation of Catalan’s table 
given here. Notice that 


Pr[E| Ay] 


Pr[E A oW, |i] + Pr[E A 1W,| Ay) 


Pr{E| oW1 A Hi] Pr[oW1| Ai] 


+ Pr[{E| iWiA Hi] Pr[1Wi1|41] 


= 4/7, 


346 8 Poisson to Whitworth 


Pr[;W;|Hi] | Modified urn | Pr[E|;W; A Aj] 


Table 1. Draws from an urn, following Catalan. 


and similarly it can be shown that each Pr[£|H;] = 4/7. Thus, since the 
{H;} are presumably mutually exclusive and exhaustive, 


Pr[E] = ) > Pr[Z|H;] Pr[H;] = 4/7, 


which agrees with Catalan’s solution. 
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It remains only to explore the reference to the use of Bayes’s ‘Theorem. 
Note firstly that for each 7 € {1,2,..., 6}, 


Pr[A,|E] Pr[E|H;] Pr[ Hy] i PrlE|A;] Pr[H;] 


Pr[HA;| P 


and so the prior and posterior probabilities of the hypotheses coincide. 

A discussion of the general case was given by Jongmans and Seneta in 
1994: the following owes much to that paper. Let A be an urn containing a 
non-random number No of balls, Xo (also non-random) of which are white 
and the remainder black. Only the perverse would then refuse to accept 
the statement that the probability p4 of drawing a white ball from A Is 
Xo/No. Now let Ny balls, X; of which are white, be drawn at random and 
without replacement from the urn A and placed in the urn B. If X; and 
N, are known, then the probability of drawing a white ball from B (event 
Wap) is 


I 


wp = Pr[We] = X1/M, . 
But in Catalan’s case X; is unknown (or random), and JN, is possibly also 
random. Thus the probability that a white ball is drawn from B is actually 
given by 

pp = Ele] = E[X1/Ni]. 
The initial drawing taking place without replacement, knowledge of the 
hypergeometric distribution shows that 

E[X1|Ni] = Ni, Xo0/No ) 

and hence 

E{X1/Ny|Ni] = Xo/No - 
Averaging over all values of NV; yields 

pp = E[X1/Ni) = ELE(X1/Ni [Nil] = Xo/No | 


or the same value as was found for py. Note here that N, has a distribution 
over {1,2,..., No}, where we may have Pr[N, = No] > 0. 


Now consider urn A after the transference of balls has occurred, and let 
rA = (Xo mal X1)/(No a Nj) : 


Suppose that N; is now a random variable on {1,2,...,No—1}, so that at 
least one ball remains in A (cf. Catalan’s result)®”. Then Pr[N; = No] = 0, 


and 
Xo- E[X1|Ni] 


No — N, 


E[ra|Ni] 


Xo — (Ni X0/No) 
No — Ni 


Xo/No } 
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and hence 
pa = Elma] = Xo/No - 


Mention has already been made (see Note 62) of the 1835 paper by one 
Bénard, “éléve de |’école polytechnique”, but in view of the result obtained 
by Catalan it might perhaps be wise to say something briefly about the 
problem considered in this paper. The question (of which Catalan’s is a 
generalization) examined is the following: 


Une premiere urne A contient n boules blanches et n boules 
noires; on en tire n boules au hasard que l|’on place dans une 
seconde urne B. De B, dont la composition nous est inconnue, 
on en tire successivement (n — p), et l’on reconnait qu’elles sont 
toutes blanches. On demande la probabilité de tirer de B une 
nouvelle boule blanche, les (n — p) ayant été mises a part a 
mesure qu’on les tirait. [p. 264] 


By an argument not totally different to Catalan’s Bénard deduces that 
the required probability is 1/2, i.e. n/(2n), which agrees (at least when 
p = 0) with the answer of 4/7 obtained in the specific example considered 
by Catalan. The inappropriateness of the assumption of a uniform prior on 
the possible original composition of A and the consequent wrong conclusion 
are discussed by Jongmans and Seneta [1994]. 

Results similar to those instanced above were given by Catalan in a 
paper of 1884, where he again stated his nouveau principe. As one of his 
applications he mentions the folowing: 


Une urne A contenait, primitivement, s boules. II en est sorti m 
blanches, m’ non-blanches. Quelles sont les probabilités d’extraire, 
soit une boule blanche, soit une boule non-blanche, de |’urne 
modifiée? 
Réponse: 
m+1 m +1 

m+m+2 °> m+m4+2— 
Ces probabilités sont les mémes que celles d’extraire, soit une 
blanche, soit une noir, d’urne B contenant m+ 1 blanches et 
m’ noires. [p. 73] 


In a footnote Catalan declares that s is supposed to be known here. The 
question was generalized in the third problem as follows: 


Une urne A contenait, primitivement, s boules. On en tiré, au 
hasard, 6 blanches, n non-blanches. Quelle est la probabilité P 
d’extraire 6’ blanches, n’ non-blanches, de l’urne modifiée? 
Réponse: 
p= Coto8 X Cninin 
Chabintnibtn! — 


[pp. 73-74] 


8.8 Eugene Charles Catalan 349 


The fourth memoir of Catalan’s to warrant our attention was published 
in 1886, under the title “Problemes et théorémes de probabilités”. Here 
Catalan considers a generalization of the following problem considered by 
Poisson in Article 32 of his Recherches sur la probabilité des jugements: 


On sait qu’une urne renfermait m boules, blanches ou noires; 
on en a tiré une blanche: et l’on demande quelle est la proba- 
bilité de l’extraction d’une nouvelle boule blanche, la premiére 
n’ayant pas été remise dans l’urne. [p. 3] 


This problem Catalan proposes to solve, unlike Poisson, by a method “qui 
supprime les longs calculs nécessités par le théoréme de Bayes” [p. 3}. To 
this end the first section of the memoir is devoted to some combinatorial 
formulae, the second section beginning with the following problem: 


Une urne A contenait, primitivement, s boules. On en a tiré, au 
hasard, m boules blanches, m’ boules non blanches. Quelle est 
la probabilité d’extraire, de l’urne modifiée, une nouvelle boule 
blanche? [p. 7] 


According to Bayes’s Theorem, the probability w; that the urn contains 
(m+k) white balls, supposing always that sampling is without replacement, 


aI) ECE 


is 
(eg ces eee) 
m m! D ) 
where p = s — m—m’. Now if k of the p balls remaining in the urn are 


white, then the probability of drawing a further white ball will be k/p, and 
hence the required probability P will be given by 


Wk 


Ee 3 (k/p) we , 
k=C 
an expression that some combinatorial prestidigitation reduces to 
P=(m+4+1)/(m4+ m' 4 2), 
independent of s. This result is summarized in the following theorem®®: 


S1, d’une urne A, contenant s boules, il est sorti m boules 
blanches, m/ boules non blanches; la probabilité de l’extraction 
d’une nouvelle boule blanche est égale a la probabilité d’extraire 
une boule blanche d’une urne B, contenant m+1 boules blanches 
et m' + 1 boules noires. [p. 9] 
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This is followed in turn by some simple corollaries. 

Recalling his aphorism “si un long calcul améne un résultat simple, il est 
inutile” [p. 9], Catalan notes that, in the case of the drawing of a further 
white ball from an urn that has already yielded m white and m’ non-white 
balls, 


La probabilité P, de cet événement, ne sera pas alterée, si les 
causes dont il dépend subissent des modifications inconnues. 


[p. 9] 


P will thus remain unaltered if 1, 2,... or even (s—m—m/ —1) balls are set 
aside. One may therefore consider the replacement of urn A by a fictitious 
urn B initially containing (m + m’ + 1) balls. After the drawing of the 
(m-+m’) balls, two hypotheses may be entertained about the composition 
of B, viz. 


Hy: mwhite and (m’ + 1) non-white balls; or 


Hz: (m-+1) white and m’ non-white balls . 


The probabilities of these hypotheses being respectively proportional to 
(m+ 1) and (m‘ +1), one finds that 


wi =(m4+1)/(m4+m' +2), wo=(m' +1)/(m4+m' 42), 


and since H, 1s incompatible with the drawing of a further white ball, H» 
necessarily holds. Thus wy» is in fact the desired probability P. 
An extension of this result is obtained in the next problem: 


Une urne A contenait, primitivement, s boules. On en a tiré, 
au hasard, 6 blanches, n non blanches. Quelle est la probabilité 
P d’extraire b’ blanches, n’ non blanches, de l’urne modifiée? 


[p. 10] 


Proceeding as before Catalan obtains the value 


b+b'\ fn+n! b+b'4+n+n'4+1 
P= 
b/ n’ b/ + n! 


the same as the result given by “la méthode classique”. Several particular 
cases follow. 

In the next problem urns containing balls of any one of three colours are 
considered, and this is extended in the following problem to f possibilities. 

In an Addition to his paper Catalan points out that a thing may be 
modified “soit en |’unissant a une chose de méme nature, soit en supprimant 
quelqu’une de ses parties” [p. 15]. His new principle, he observes, is not 
applicable in the case of modifications of the first type, and as an example 
he considers the question of the drawing of balls of various colours from an 
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urn whose initial composition is known and to which a further n balls, of 
unknown shades, are added. Indeed, if the urn initially contained a white, 
b black, and c red balls, the probability, after the addition of the n balls, 
of drawing from the urn of sizes =n+a+b+c, a,band c balls coloured 
white, black and red respectively, is 


p= LCE YG): 


independent of the actual values a, b,c. 

In 1888 Catalan’s paper “Sur une application du théoreme de Bayes, 
faite par Laplace” appeared — a paper, as we shall see, in which many 
of his earlier results are rehearsed. Here he states the “Principe” given 
in Laplace’s memoir of 1774, and notes that Laplace stated this result 
“sans nommer Bayes” — a fact that is perhaps hardly surprising, since the 
proposition is not in fact found in Bayes’s Essay. Laplace then, as Catalan 
notes, applied this result (in his “Probleme 1”) to the problem of finding 
the probability P of drawing a white ball from an urn containing an infinite 
number of white and black balls, if (p + q) draws have already resulted in 
p white and q black, the solution being given by 


P=(p+1)/(p+q+2). (34) 


In musing on this result Catalan was apparently struck by a multitude 
of questions, among which he mentions the following: 


(1) Why was Laplace not struck by the simplicity of this result? 


(ii) Why did he not perceive that his calculation, so simple in the case of 
an infinite number of balls, would become prolix and tedious if one 
supposed the number of balls to be ten thousand, for example? 


(iii) Why did he not ask if his formula (34) would not hold in the case of 
any number whatsoever, greater than (p+ q), of balls? 


Here Catalan proposes to consider the following general problem: 


Une urne A contenait, primitivement, s boules. On en a tiré, au 
hasard, m boules blanches, m’ boules non blanches. Quelle est 
la probabilité d’extraire, de |’urne modifiée, une nouvelle boule 
blanche? [p. 256] 


The event expected (“l’événement attendu”) is then defined as the draw- 
ing of a white ball from the urn of (s — m — m’) balls of various colours 
in unknown proportions. Basic to the solution presented is the following 
observation (from his paper of 1877): 


La probabilité P, de cet événement, ne sera pas altérée, si les 
causes dont il dépend subissent des modifications inconnues. 
[p. 256] 
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It thus follows that p is unchanged if 1, 2, ... ,(s -m-—m/’ — 1) balls from 
the original urn are placed, unseen, to one side. 

This, however, as we have noted before, is tantamount to replacing the 
original urn A by a fictitious urn B containing (m-+m/ +1) balls, of which 
m are white and m’ non-white. The urn B may then have either of the 
following compositions: 


H,: mwhite and (m’+ 1) non-white balls; or 


Hy»: (m-+1) white and m’ non-white balls , 
with w; = Pr[H,] « (m' +1) and we = Pr[He] x (m+ 1). Thus 
wy = (m' +1)/(m4+m'4+2) and wo=(m+4+1)/(m+m' 42). 


Since H, is incompatible with the observed event, the second must in fact 
obtain. Thus the desired probability is 


P=w2=(m+1)/(m+m' +2), 


which agrees with that given in (34) above. 

As a final relevancy from this paper we may cite the extension made to 
sampling from an urn containing balls of k colours. If m; balls of colour 7 
have been obtained, the probability that the next draw will yield a ball of 
j-th colour is | 


(m; +1) /(m+---+me_+hk), 7 €{1,2,...,k}. 


As a postscript Catalan points out that if the balls (b white, n black) from 
an urn A are distributed, unseen, among urns B), Bz,..., Bx, the prob- 
ability of drawing a white ball from any of these auxiliary urns will be 
b/(b+n), unless k > 6+ n. 

Several comments on this paper come to mind. The first is to note that 
a similar discussion of the finite urn was given by Terrot (see §8.18), with 
later and more detailed discussion by Keynes [1921, chap. XXX, §11] and 
Burnside [1928], though the latter two authors concentrate mainly on the 
case of sampling with replacement, while Catalan’s concern is with sam- 
pling without replacement. 

Secondly, as Burnside (op. cit.) has pointed out, the assumption that 
all of n results are equally likely is not the same as requiring that each 
two of the n results are equally likely. The latter has been shown by this 
author to be the appropriate assumption to be made in questions of the 
type discussed by Catalan, and it appears that this should be taken into 
account in the latter’s work. 

Thirdly, the extension to balls of & colours was, as we have already seen, 
given by Lubbock and Drinkwater-Bethune [c.1830]. Ignorance of this ex- 
tension led Kneale [1949, pp. 203-204] to a vain attempt at confutation of 
the rule of succession. 
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8.9 Jacob Friedrich Friess (1773-1843) 


In the second chapter, “Berechnung der Wahrscheinlichkeit, wenn die Theil- 
ung der Sphare in ihre gleichmoglichen Falle selbst erst errathen werden 
mu, ober Bestimmung der Wahrscheinlichkeit a posteriori” of the first 
section “Reine Theorie der Wahrscheinlichkeitsrechnung” of his book Ver- 
usch einer Kritik der Principien der Wahrscheinlichkettsrechnung of 1842, 
Friess®? gives the expression 


ee [ ema-ayrae, 


and points out (though not in so many words) that this holds for a uni- 
form prior. He also deduces the rule of succession. No mention of Bayes or 
Laplace is to be found here. 


8.10 Antoine Augustin Cournot (1801-1877) 


The eighth chapter’? of Cournot’s Exposition de la Théorie des Chances 
et des Probabilités of 1843 is devoted to a study of posterior probabilities. 
Some slight misunderstanding of Bernoulli’s Theorem seems evident here, 
however, for in writing of the need for the determination “par |’expérience, 
ou @ posteriori” [p. 154] of chances according to data, Cournot writes 


le principe de Jacques Bernoulli conduit a cette détermination 
expérimentale: car si, en désignant par x la chance inconnue de 
la production d’un événement, par n le nombre de fois que cet 
événement est arrivé en m épreuves, on peut toujours obtenir 
une probabilité P que l’écart fortuit c — n/m tombe entre les 
limites +2 (le nombre @ et la différence 1 — P tombant au- 
dessous de toute grandeur assignable, pourvu que les nombres 
m,n soient suffisament grands), il est clair que, si rien ne limite 
le nombre des épreuves, la probabilité z peut étre déterminée 
avec une précision indéfinie; qu’on peut arriver, par exemple, 
a étre sir qu’il n’y a pas, entre le rapport n/m donné par 
Vexpérience et le nombre inconnu 2, une différence d’un cent- 


milliéme. [pp. 154-155] 


In view of the assumption here that x is unknown, the description seems 
more applicable to Bayes’s Theorem than Bernoulli’s, though it is not clear 
whether Cournot viewed the former as anything more than an extension of 
the latter. 

Having noted that Bernoulli’s work enables one to pass on immediately 
to scientific applications, Cournot remarks that 
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une régle dont le premier énoncé appartient a l’Anglais Bayes, 
et sur laquelle Condorcet, Laplace et leurs successeurs ont voulu 
édifier la doctrine des probabilités @ posteriori, est devenue la 
source de nombreuses équivoques qu’il faut d’abord éclaircir, 
d’erreurs graves qu’il faut rectifier, et qui se rectifient dés qu’on 
a présente a l’esprit la distinction fondamentale entre les prob- 
abilités qui ont une existence objective, qui donnent la mesure 
de la possibilité des choses, et les probabilités subjectives, rela- 
tives en partie a nos connaissances, en partie 4 notre ignorance, 
variables d’une intelligence a une autre, selon leurs capacités et 
les données qui leur sont fournies. [p. 155] 


Several “urn-and-balls” examples now follow: in the first of these Cournot 
considers urns of three constitutions, viz. 


Type 1: three white balls; 
Type 2: one black and two white balls; 
Type 3: one white and two black balls. 


He supposes too that there are the same numbers of each type of urn 
(not necessarily only one of each, as one usually finds). An urn having been 
chosen at random, a ball is chosen, also at random, from that urn: it turns 
out to be white. The answer to the question “what are the probabilities 
that this (white) ball came from urns of types 1, 2 and 3°?” is obtained, 
however, only after several pages of what is at times a somewhat rambling 
argument, during the course of which Cournot states Bayes’s Theorem as 


follows"!: 


Les probabilités des causes ou des hypotheses sont proportion- 
elles aux probabilités que ces causes donnent pour les événements 
observes. La probabilité de l’une de ces causes ou hypotheses est 
une fraction qui a pour numeérateur la probabilité de l’événement 
par suite de cette cause, et pour dénominateur la somme des 
probabilités semblables relatives a toutes les causes ou hypotheses. 


[p. 158] 
Thus understood, he goes on to point out, 


la regle de Bayes est un théoréme qui ne donne lieu a aucune 
équivoque, et dont on ne peut contester la justesse [p. 158], 


although a scant three pages before (as we have already seen), in writing of 
this rule on which Condorcet, Laplace and their successors had wished to 
build the theory of a posteriori probabilities, Cournot had drawn attention 
to the ambiguities and the grave errors resulting from the misuse of this 
rule ——- the rectification of which misuse called for a distinction between 
objective and subjective probabilities”. 
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As an illustration of the use of Bayes’s Theorem in the subjective theory 
Cournot considers three players whose probabilities of winning a game are 
in the ratio 3 : 2: 1. These probabilities will vary from one individual to 
another, depending on knowledge. In this subjective setting Bayes’s rule 


n’a donc d’autre utilité que celle de conduire a une fixation de 
paris, dans une certaine hypothése sur les choses que connait et 
sur celles qu’ignore l’arbitre. [p. 160] 


Noting next that 


Dans les applications qu’on entend faire ordinairement de la 
régle de Bayes, on ne sait absolument rien sur la constitution 
de l’urne [pp. 161-162], 


Cournot passes to the consideration of an urn containing an infinite number 
of balls, and hence to the continuous version of Bayes’s Theorem. From this 
it is but a simple step to the derivation of 


z"(1— ~ 


as the ordinate of “la courbe de probabilité” [p. 162] of the value x of the 
chance of the extraction of a white ball from an urn from which n white 
and m — n black balls have been drawn (with replacement). Then 


La valeur moyenne, qui exprime aussi la probabilité de |’extract- 
ion d’une boule blanche dans un tirage subséquent [p. 163] 


is given (without derivation) as OG = (n+ 1)/(m+ 1), the maximum or- 
dinate of the curve being OK = n/m (see Figure 8.1). 

Useful though Bayes’s Theorem might be, Cournot is unable to recom- 
mend its use unreservedly; and in considering, as an illustration, the chance 
of the conception of an infant of one or the other sex, he concludes that, in 
the absence of sufficient data on the numbers of first-born that are male, 
the numbers of times in which the birth of a male has been followed by 
that of a female, etc.”°, 


Vapplication de la regle de Bayes ne conduirait, ... qu’a une 
conséquence futile ou illusoire. [p. 165] 


I have already hinted that Cournot was perhaps not altogether clear on 
the distinction between Bayes’s and Bernoulli’s Theorems. This opinion is 
strengthened by our reading Cournot’s Article 95. Here it is supposed that 
the m and n mentioned earlier become very large, in which case the points 
K and G (see Figure 8.1) are to all intents and purposes coincident, and 


le résultat trouvé par le regle de Bayes ne differe plus sensi- 
blement de celui que donnerait le théoreme de Bernoulli. 
[p. 166] 
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FIGURE 8.1. A probability curve for drawings from an urn. 


FIGURE 8.2. A posterior probability curve. 
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While one might take this at first sight to mean nothing more than that 
(n+1)/(m4+1)xn/m 


for large values of m and n, the subsequent discussion suggests that rather 
more is intended. For Cournot writes 


la vérité du théoreme de Bernoulli est indépendante de toute 
hypothese sur le triage préalable de urne. Ce n’est point dans 
ce cas (comme beaucoup d’auteurs ont paru se le figurer) la régle 
de Bernoulli qui devient exacte en se rapprochant de la régle de 
Bayes; c’est la regle de Bayes qui devient exacte, ou qui acquiert 
une valeur objective qu’elle n’avait pas, en se confondant avec 


la réegle de Bernoulli. [p. 166] 


This comment is then substantiated as follows: let n white balls be obtained 
in m draws from an urn. Bayes’s rule then gives the probability 


n 


P=Prl[~-I<e<=4i], (35) 


where x is the chance that a white ball is drawn. (See Figure 8.2, where 
Kk is the ordinate representing the mazzmum value of the curve OB, KI 
and KL are lengths equal to /, and P is represented by the ratio of the 
area I Llki to the total area OBlki.) Cournot notes, in passing, that, for a 
fixed value of P, / decreases as n and m increase. Now 


Quand la chance de mettre la main sur une urne pour laquelle 
la chance d’extraction d’une boule blanche est x, reste la méme 
quel que soit x, la probabilité P a une valeur objective. [p. 167] 


In other words (as Cournot goes on to say), if after having chosen an urn 
at random and then having obtained n white balls in m draws from it, I 
judge that the chance x of the appearance of a white ball from this urn lies 
between the limits given in (35) above, and if I repeat this judgement for 
N similar results from N different urns, then it is obvious that the ratio of 
the number of correct judgements to the number of incorrect ones is as the 
ratio of P tol — P. 

Suppose next that the chance of choosing an urn varies with the value 
of x for that urn (see Figure 8.3, where o’k’b’ represents the probability law 
of x, and where OI’, OK’ and OL’ are respectively (n/m) — 1, n/m and 
(n/m)+1. Further, let A’k’ mark the minimum of the curve). Examination 
of this figure then persuades Cournot that it may turn out that, in a large 
number N of repetitions as detailed above, the number of cases in which x 
falls outside the limits given in (35) certainly exceeds the number of cases 
in which « falls within those limits — although P may always exceed 1/2, 
or even be near to 1. If one now judges that, for one of these urns, chosen at 
random, x lies between the limits in (35), then the chance of an error is no 
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FIGURE 8.3. A non-uniform prior probability curve. 


longer 1 — P but another fraction 1 — P’, which may well differ from 1 — P. 
This new fraction is unknown as long as the probability law represented by 
o’k'b! is unknown. 


La probabilité P, conclue de la regle de Bayes, ne peut plus 
étre prise que dans un sens subjectif, comme servant a régler 
les conditions d’un pari, tant que nous ne possédons aucune 
donnée sur la forme de la courbe o’k’b’. [p. 169] 


Cournot now turns his attention to Bernoulli’s Theorem. If m and n are 
again large, it follows from this result that, for values of x like OE” that 
fall far below OI’, the event consisting of the obtaining of n white balls 
in m draws is almost impossible (and similarly for values of « very much 
bigger than OL’). 


On n’aurait donc plus a considérer, dans le calcul du nombre 
P' désigné plus haut, si la courbe o’k’b’ était donnée, que la 
portion de cette courbe voisine de k’, pour laquelle l’ordonnée 
a une valeur peu différente de K’'k’ ou de n/m. [p. 169] 


Since, in the neighbourhood of k’, the ordinate varies only slightly, and 
since only ordinates in this region have any appreciable effect on the value 
of P’, 


Verreur que l’on commet en supposant implicitement cette or- 
donnée constante, suivant la régle de Bayes, est une erreur trés- 
petite. [pp. 169-170] 
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Thus P may be used as an approximation to P’, which gives the (subjective) 
probability P an objective value, independent of the form of the unknown 
probability function. 

Cournot now claims that the probability P of expression (35) above is a 


function of 
m 


ae | —_———__ . 
my | > ae (36) 
This indeed follows from an earlier part of his book and the de Moivre- 
Laplace limit theorem in the form 


P Pr [|x — (n/m)| < J] 


2 : 2 
we i e 4 dy 5 
0 


with t = 1,/m/2z(1 — x) (see also §8.4). Substitution of n/m for zx yields 
the value for t given in (36). 


The rest of this chapter (some 10 pages) of Cournot’s is devoted to the 
consideration of problems similar to that we have just discussed, in which 
repeated series of experiments of the kind we have considered are exam- 
ined, and in which the balls are not only of two different colours but are 
also marked with one or other of two different letters. We shall not pursue 
the matter further here. 

The next use Cournot makes of Bayes’s Theorem is in connexion with 
astronomy. He shows that the posterior probability that the chance p of a 
direct motion for each of the 11 planets’ is (2! — 1)/2!, a value presum- 
ably computed from 


Pr [p > ayay= fl attas | fa ar, 


If, however, one permits only the prior values $ and 1 for p, or (equiva- 
lently) one entertains only the two (equiprobable) hypotheses 


Hy: the motions are necessarily direct, and 
H»: the motions are equally (or indifferently) direct or retrograde, 


and if one denotes by EF the observation that all 11 planets have a direct 
motion, then 


_ Pr[E|H,] Pr[ Ay] 
PrAale] = Soe) Ply) + Pr(E] Ha) Pil Ha] 


1 gil 


141/22 © glb4 40 
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(I assume this is Cournot’s reasoning: only the answer is given in his book.) 
Finally, consideration of the planetary elements quoted by Cournot shows 
that, for each planet other than the Earth, the longitude of the ascendant 
node is between 0° and 180°. Bayes’s Rule then shows that the probability 
that non-independent causes have favoured the concentration of the ascen- 
dant planetary nodes in that half of the ecliptic where the longitudes are 
less than 180°, is (244 — 1)/21!, a value that is presumably arrived at in a 
similar way to an earlier result. 

Like all good authors of his time, Cournot was unable to write a book 
on probability without mentioning the role of chance in judgments and tes- 
timony: it is to this part of his treatise that we now turn our attention. 

In the first of the two chapters devoted to these topics — Chapter XV, 
entitled “Théorie de la probabilité des jugements. — Applications a la 
statistique judiciaire en matieére civile” — basic formulae are established to 
be used in later work on testimony. 

Cournot begins his discussion by considering, as an example, the case of 
a rustic who, at each setting of the sun, predicts the time of the next day’s 
setting’®. If he is correct n out of m times, the fraction vy = n/m expresses 
the probability that another prognostication by that man will be correct. 
Suppose now that we have two such observers A and B with probabilities 
y and v’. Then 


1°. The probability that A and B agree in their judgement is 
p=vv'+(1-v)\1—-v’). (37) 


2°. The probability that they disagree is 


gq=v(il-v')4+uv"(1-v)=1-p. 


3°. The probability that the prognostication of the event on which A and 
B agree is verified, is 


yy! 


= 38 

vy'+(1—v)(1—v’) i) 

4°. The probability that A’s prognostication is verified while B’s is not 
is 


Py v(1—v"') 
ne vl—-v)+v'(1-v)- 


Passing next to the case of three sons of the soil, Cournot deduces the 
following obvious generalizations of the previous expressions: 


p=1l—-w4+n+v")4+u 4+" +" 
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a=v(il—v'—v")+r'" 
b=v(l—-v—-v")+v" (39) 


c=v"(l-v—v')+v0’. 


He then supposes that there is no criterion appropriate to the direct de- 
termination of v, v’ and v”, and suggests that, in such a case, these values 
should be determined from the observed values of p, a, 6 and c, together 
with the relationship 

ptatb+c=1. 


This determination is carried out by setting 


dee tl ne t ee i 
yrooxtzZ, V =5 +2, V = 5 1Z, 


) 


fre 


ss 1 es 1 a 
@=a— 4, BP=b-G, yoe- 
and by writing the expressions in (39) in terms of the new variables. It then 
easily follows that 


: L (a+b—s)(a+c-— $) 

ae) 22 eo) , 

with similar expressions for v’ and v”. To ensure that v, v’ and v” are 
all real, it is necessary either that (a +6 — s), (a +c- 5) and (b +c — i) 
are all negative, or that two of them are positive and the third negative. 
Further, if v, v’ and v” are to be in the interval (0,1), it is necessary that 


atb<1, at+te<1, b+c<1. 


Having considered this general case, Cournot now turns his attention to 
matters judicial. Decisions of courts of first instance’® (which have three 
judges in the main) may be dealt with in the manner instanced before. For 
if the court recorder takes note of the votes of each judge, values of a, 6 
and c may be obtained after a long sequence of cases, and these values will 
in turn allow the determination of v, v’ and v”. And while two systems 
of values for v, v’ and v” will be found, common sense, Cournot suggests 
(though not in so many words), will show that only one is admissible. 

He notes, incidentally, that all the results so far obtained depend upon 
the assumption that the judges decide independently on the merits of a 
case. Further, ifv, v’ and v” are viewed a priori as equal, then 


(40) 


the value of v that is less than 5 being discarded. Thus it is enough to know 
p, the ratio of the number of judgments in which unanimity was reached, to 
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the total number of judgments. Cournot further notes that (40) also holds 
when judges are drawn from a list, where v now denotes the mean of the 
true values v for each person on that list. 

Various extensions and generalizations of these results are then given*’: 
we shall mention only one, viz. the probability V,, of a favourable judgment 
(one that is either unanimous or a simple majority) when there are 2m + 1 
judges on the tribunal, and when the chance of any of the judges not being 
mistaken is vy, is given by 


— (2m+1 
Ves die asa a Goa ee 
> ( ‘a (1-v) 


Cournot perhaps goes further than others of his period by actually citing 
figures (for the 1830’s) of judgments, and using his formulae to estimate 
certain values. 

In Chapter XVI, “Suite de la théorie de la probabilité des jugements. 
— Applications 4 la statistique judiciaire, en matiére criminelle. — De 
la probabilité des témoignages”, Cournot turns his attention to another 
branch of the judicial system. 

Returning firstly to the pastoral problem considered earlier in the text, 
Cournot supposes that the two observers are endowed “au méme degré de 
perspicacité et d’expérience” [p. 381]. Then v = v’, and equation (37) above 
becomes 

p2=1l—2v+2v’, 


whence 1 
y= 5 EV ep— 1. 


If the sequence of observations is now divided into n categories, with k; de- 
noting the proportion of observations in the ith category, 7 € {1,2,...,n}, 
and if vy; and p; are defined in the obvious way, then the true value of rv is 
given by )-) ki, and 


Lois, 


Cournot then considers the probability of a favourable judgment’s being 
given by a tribunal, and next turns his attention to the case in which the k; 
are unknown, proposing to determine them from observations. This clearly 
provides a link between statistics and judicial matters; or, in Cournot’s own 
words, 


C’est sur la solution d’un probleme de cette nature que reposent 
les applications de la théorie des chances 4 la statistique judi- 
ciaire, en matiére criminelle. [p. 388] 
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We shall not pursue the tortuous argument presented here any further: 
however, it is interesting to note the power that Cournot saw in his rea- 
soning: 


Ces explications ont l’avantage de fournir une définition précise 
et mathématique du sens qui s’attache aux mots condamnable et 
acquittable: elles font voir avec netteté comment la classification 
des accusés en condamnables et acquittables se rapporte a |’état 
des lumiéres, aux dispositions morales de la classe de citoyens 
au sein de laquelle on prend les jurés ou les juges criminels; de 
maniére que, les juges venant a étre pris dans une autre classe ou 
a subir dans la méme classe d’autres influences, telle catégories 
d’accusés pourront passer de la classe des accusés condamnables 
a celle des acquittables, ou réciproquement. [p. 393-394] 


How easy would the task of a judge be, it seems, were he only to know 
some simple probability theory! 

After some discussion of the effects of varying the number of judges, 
or of using juries with different majorities required for a “Guilty” verdict 
— a discussion illustrated by consideration of the numbers of those ac- 
cused of various types of crimes (and the number of convictions) in the 
1830’s — Cournot passes to a subsection entitled “De la probabilité des 
témoignages”. After the wealth (?) of detail on his work on judgments, 
even Cournot finds little to say here. Indeed, he writes 


Les longues explications dans lesquelles nous sommes entré au 
sujet de la probabilité des jugements, ne nous laissent que peu 
de chose a dire concernant la probabilité des temoignages. 


[p. 410] 


Expressions (37) and (38) are repeated here, this time with p and v hav- 
ing a testimonial interpretation. An extension to n witnesses is given, the 
probability of the truth of the testimony being given as 


yy'v" .. pir) 


Ve Se SS ee 
yyy"... ve-)4 (1 —v)\(1—v’)...(1-v@-)) 


Noting the desirability of incorporating into the calculations 


L’amour du merveilleux, l’entrainement des préjugés, l’exaltation 
de l’esprit de secte et de parti, tout ce qui met en jeu sympathies 
et les antipathies du cceur humain [p. 414] 


as well as the chance of prevarication or corruption, Cournot concludes by 
warning against the application of his theory to a chain of testimony. He 
writes 
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nous nous garderons de vouloir appliquer le calcul a la proba- 
bilité des faits réputés connus par une chaine de témoignages, 
ou par la tradition. Non-seulement les valeurs des éléments 
qui entrent dans de tels calculs ne sont nullement assignables, 
mais les combinaisons mémes de ces éléments dans le calcul re- 
posent sur des hypotheses gratuites, par lesquelles on établit 
une indépendance fictive entre des faits réellement solidaires, 
et dont la solidarité répugne a toute application légitime de la 
théorie des chances. [p. 415] 


8.11 John Stuart Mill (1806-1873) 


The only pertinent result by this author, a writer justly famous for his 
economic, philosophic and logical works, is to be found in his A System of 
Logic, Ratiocinative and Inductive: Being a Connected View of the Prin- 
ciples of Evidence and the Methods of Scientific Investigation. First pub- 
lished in 1843, this work went through eight editions in all in Mill’s lifetime, 
each edition being carefully revised’®. So substantial were the alterations 
adopted, after the publication of the first edition, in Book III, Chapter 
XVITI, “Of the Calculation of Chances”, that Mill had the revision, together 
with that of Chapter XXV, published as a separatum. Our comments here 
will be restricted to the first edition with reference, where relevant, to the 
eighth edition of 1872. 

Mill begins Book III, Chapter XVIII, by recalling Laplace’s definition of 
probability (given in the Essai philosophique sur les probabilités) as having 
reference partly to our ignorance and partly to our knowledge. Laplace’s 
requirement that events should be mutually exclusive and exhaustive and 
equally possible is found by Mill to be unsatisfactory: 


To be able to pronounce two events equally probable, it is not 
enough that we should know that one or the other must happen, 
and should have no ground for conjecturing which. Experience 
must have shown that the two events are of equally frequent 
occurrence. [1843: §2] 


This view was afterwards withdrawn”, Mill concluding that 


the theory of chances, as conceived by Laplace and by math- 
ematicians generally, has not the fundamental fallacy which I 
had ascribed to it [1872: §1] 


and this in turn is based on his belief that 


the probability of an event is not a quality of the event itself, 
but a mere name for the degree of ground which we, or some 
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one else, have for expecting it. The probability of an event to 
one person is a different thing from the probability of the same 
event to another, or to the same person after he has acquired 
additional evidence. [1872: §1] 


However Laplace’s definition is perhaps not altogether suitable when it 
comes to the application of the doctrine of chances to a scientific purpose, 
and Mill points out that the knowledge required in such a case “is that of 
the comparative frequency with which the different events in fact occur” 
[1872: §3]; and he professes further the (perhaps somewhat unorthodox) 
opinion that®° 


The probability of events as calculated from their mere fre- 
quency in past experience, affords a less secure basis for practi- 
cal guidance, than their probability as deduced from an equally 
accurate knowledge of the frequency of occurrence of their causes. 
[1872: §4] 


I have described this method as “perhaps somewhat unorthodox”, for 
Mill is apparently suggesting that if n out of N cases in our past experi- 
ence have yielded the event EH, then n/N (or maybe some function of this 
ratio) is perhaps not the best thing to take as a guide: yet this is the ratio 
commonly used to estimate the probability Pr[#]. Let me try to elucidate 
the argument as I see it. 

Earlier in this chapter Mill asserts that the probability of an ace on one 
throw of a die is a sixth 


because we do actually know, either by reasoning or by experi- 
ence, that in a hundred, or a million of throws, ace is thrown in 
about one-sixth of that number, or once in six times. [1872: §3] 


He notes in the next section that whether reasoning or experience is used 
in the estimation of probabilities is a matter of no little importance. For if 
the ratio of the observed frequency of the occurrence of /& to the observed 
frequency of the non-occurrence of F is used, 


the evidence is only that of the Method of Agreement, and the 
conclusion amounts only to an empirical law. [1872: §4] 


(The Method of Agreement — one of Mill’s Four Methods of Experimental 
Inquiry to be used in trying to find out those circumstances that are ac- 
tually connected by an invariable law to some or other phenomenon, these 
circumstances either preceding or following that phenomenon — is one in 
which different instances in which the phenomenon occurs are compared.) 

If we consider, on the other hand, the causes on which the occurrence or 
non-occurrence of F’ depends, and estimate the ratio of the favourable to 
the unfavourable causes, then, says Mill, 
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These are data of a higher order, by which the empirical law 
derived from a mere numerical comparison of affirmative and 
negative instances will be either corrected or confirmed, and in 
elther case we shall obtain a more correct measure of probability 
than is given by that numerical comparison. [1872: §4] 


Even in such a simple example as a “balls in a box” one, it is not just a spe- 
cific experience that leads to the estimation of probabilities, but rather the 
stronger reasons of causation (see also §8.17 below). Indeed, the happen- 
ing of an event £ provides grounds for our expecting / to happen again, 
because this first occurrence proves that there exists —- or may exist — a 
cause that can produce it. (Price’s example of the rising of the sun — see 
§4.6 —- springs to mind in this connexion.®!) 

In §3 (§5 of the eighth edition) Mill quotes and proves the sixth princi- 
ple in Laplace’s E'ssat philosophique sur les probabilités. ‘The discussion is 
limited to two possible causes of an event, the posterior probabilities being 
arrived at via frequentist considerations. 

In the first edition Mill finds it “necessary to point out another serious 
oversight in Laplace’s theory” [§3]. He finds the preceding proposition un- 
tenable when its application is extended to cover hypotheses rather than 
causes, on the ground that the substitution of 


mere suppositions affording no ground for concluding that the 
effect would be produced, in the room of causes capable of pro- 
ducing it [1843: §3], 


would invalidate the theorem. This argument appears to rest upon the 
assumption that 


Pr[A | M] = Pr[A] Pr[/ | A] , 


and it is then hardly surprising that he concludes that “the proposition, as 
thus stated, is an absurdity.” This passage was dropped from later editions. 


8.12 Lambert Adolphe Jacques Quetelet 
(1796-1874) 


More renowned for his applications of statistics ana probability theory to 
social phenomena than for the development of theory®?, Quetelet in fact 
published three popular books on probability, viz. Quetelet [1828], [1846] 
and [1853]. Of these, only the Lettres ad S. A. R. le duc régnant de Saze- 
Cobourg et Gotha, sur la théorte des probabilités, appliquée aux sciences 
morales et politiques of 1846 seems to have anything relevant to our topic, 
and we shall accordingly restrict our attention here to it. 

Noting that the more frequently an event has occurred under the same 
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circumstances, the more probable it becomes that this event was brought 
about by one cause or by several simultaneous causes, Quetelet writes in 
his fourth letter®? 


le géometre anglais Bayes proposa la régle suivante pour en 
apprécier la valeur: Quand on a observé plusiers fois de suite 
un méme événement, la probabilité qu’il existe une cause qui 
en facilité la reproduction, est exprimée par une fraction qui 
a, pour denominateur, le nombre 2, multiplié autant de fois 
par lui-méme que |’événement a été observé de fois, et pour 
numerateur, le méme produit moins 1. [1846, p. 24] 


As we have already seen, however, this is more in line with Price’s Ap- 
pendix to Bayes’s Essay than with the work of the “English geometrician” 
himself 54. 

I shall quote in full the first example given by Quetelet as it shows an 
interesting point of view. 


Apres avoir vu monter la mer périodiquement dix fois de suite, 

a douze heures et demie de distance environ, si l’on se demande 

quelle est la probabilité qu’elle fomtcrs encore une onzieme fois, 

un aura, comme je l’ai déja dit, + s. De plus, d’aprés le principe 

précédent, la probabilité qu’il existe une cause qui nécessite la 
2047 


reproduction de ce phénoméne, sera 5673. [1846, p. 24] 


The first probability may be given more symbolically as 


Pr[one further occurrence|10 occurrences] = +5, 


or, more generally, 


Pr{one further occurrence|n occurrences] = ae 


which we recognize as (a special case of) Laplace’s rule of succession. The 
second probability seems to be calculated in general (and using the Notes 
on pp. 369-370 of Quetelet [1846]) from 


Pr[event happens by chance n times] = 1/2"T', 
or, equivalently, 


Pria cause exists] = 1-— Prievent happens by chance] 


= (20% £2 Lots 


(In the afore-mentioned Notes the formula is quoted with reference to 
Cournot [1843, p. 155], and is followed ([1846, p. 370]) by the words “que 
Yon a nommeée la régle de Bayes” — so it may well have been, but not by 
Cournot — see §8.10.) 

Quetelet now tells The Grand Duke that 
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Votre Altesse voit que nous avons plus de raisons de croire a 
l’existence d’une cause qui a facilité dix fois de suite la reproduc- 
tion du méme phénomeéne dans lesmémes circonstances, qu’a sa 
reproduction prochaine pour la onziéme fois. [1846, pp. 24-25] 


That is, in general, 
n+1 
n+2 


which is easily seen to be true for n € {1,2,...}. 
However, we then read that 


(Qs a Lye > 


En général, la probabilité qu’il existe une cause qui nécessite 
la reproduction d’un événement observé plusiers fois de suite, 
croit beaucoup plus rapidement que la probabilité du prochain 
retour de cet événement. [1846, p. 25] 


Now it is not quite clear, at least to me, what Quetelet means by the phrase 
[pi] increases much more rapidly than [po]. («) 


On referring to the Notes we find the words 


Les deux probabilités convergent donc vers la certitude, 4 mesure 
que l’événement se répete plus de fois, mais d’une maniere 
inégalement rapide. La derniére probabilité croit le plus rapide- 


ment. [1846, p. 370] 


On setting 
pi(z) = (@ + 1)/(# + 2) 


po(x) = (27+? — 1)/274? , 
we find, in the Notes, 
pi(x) = po(2") > x = 2(27 —1), 


together with a table of pairs (z, z’) for which this last equality is satisfied. 

Now while it is certainly true that po(zo) > pi(xo) for any xo, the phrase 
(*) seems to say something about “rates of increase”, and it is certainly 
not true that the derivatives satisfy p5(x) > p{(x) for all x. Indeed, 


pi (2) = 1/(@ + 2)’, 
po(z) = 2-@ +) Ina, 
and it is easy to show, by substitution, that 


Pi(z) < po(z), 2 € {1,2,3}. 
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The inequality p(x) > p,(z) may equivalently be written as 
gr+i 
——~ > In2. 
(+27 
Since this is true for x = 4, and since y(a + 1) > y(z), it follows that 
pi(x) > pi(x), 2 € {4,5,...}. 


We note, in passing, that p}(x) = p5(x) for x » 3.26301. 

Thus it is not true that pj(x) > p(x) for all « € IN, and we must 
therefore conclude that by (*) is meant merely that, in our notation, 
p2(x) > pi(z) for all z, or 


grt] a n+1 
gntl n+2’ 


y(x) = 


Yn elnN . 


8.13 Mathurin-Claude-Charles Gouraud 
(1823-7) 


In 1848 Gouraud®® published his Histoire du Calcul des Probabilités depuis 
ses origines jusqu’d nos jours, a work that Todhunter [1865] describes as 


a popular narrative entirely free from mathematical symbols, 
containing however some important specific references. Exact 
truth occasionally suffers for the sake of a rhetorical style un- 
suitable alike to history and to science; nevertheless the general 
reader will be gratified by a lively and vigorous exhibition of 
the whole course of the subject. [p. x] 


Gouraud correctly attributes [p. 47] the contributions made by Bernoulli 
and de Moivre to the result generally known by the former’s name; and he 
later [pp. 61-62] contrasts this result with that proved by Bayes. He then 
points out [pp. 62-63] the use and development of Bayes’s Theorem made 
by Laplace, “le sublime géométre” [p. 64]. He perhaps errs, however, in 
referring to Condorcet’s Essaz of 1785 as containing the 


principe récemment entrevu par Bayes et démontré par Laplace. 


[pp. 95-96] 


The same error is repeated towards the end of this Histoire, where we find 
the words 


Le Principe entrevu par Bayes et analytiquement démontré par 
Laplace, qui consiste a conclure la probabilité des causes et 
de leur action future de la simple observation des événements 
passés. [p. 146] 
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We have in fact already discussed the distinct contributions made by Bayes 
and Laplace in this respect. 

Reference is made to Price’s actuarial work and Condorcet’s memoir 
of 1781-1784, stressing the latter’s work on (a) the determination of the 
probability of future events from the observation of past events, and (b) 
testimony. Several pages are devoted to a discussion of Laplace’s Théorte 
analytique des probabilités, including the Essai philosophique sur les proba- 
bilités, and despite fulsome praise of this work, Gouraud considers Laplace’s 
historical comments to be too short in respect of certain passages and to 
have some regrettable omissions. 


8.14 Robert Leslie Ellis (1817-1859) 


An early exponent of the frequency interpretation of probability, Ellis®® 
published in 1849 in the Transactions of the Cambridge Philosophical So- 
ciety [1844] a paper entitled “On the foundations of the theory of prob- 
abilities”. Although aimed at showing “the inconsistency of the theory of 
probabilities with any other than a sensational philosophy” [p. 1], the pa- 
per contains some comments on inverse probability. 

Thus, writing of the application of probability to inductive results, Ellis 
notes that, if a certain event has been observed to occur on m occasions, 
“there is a presumption that it will recur on the next occasion” [p. 4], a 
presumption estimated by (m+ 1)/(m + 2). This, however, prompts two 
questions, viz. 


What shall constitute a “next occasion?” What degree of simi- 
larity in the new event to those which have preceded it, entitles 
it to be considered a recurrence of the same event? [p. 4] 


questions that Ellis considers with special reference to a simple example 
appearing in de Morgan’s Essay on Probabilities (1838, p. 64]. 
Finding the 


assertion ... that 3/4 is the probability that any observed event 
had on an 4 priori probability greater than 7 or that three out 
of four observed events had such an 4 priori probability [p. 5] 


to be completely lacking in precision, Ellis proposes the following frequency 
explanation®”. Suppose that a large number h of trials are performed, in 
each of which the probability of a certain event is 1/m. Then let a second 
sequence of A trials be carried out, the probability now being 2/m, &c. 
After all these sequences, approximately h 4 i/m of the sought events 
will have occurred, of which h ay 5 +i/m) had an a priori probability 
greater than 1/2. The ratio of the second to the first of these series gives 
(3m + 2)/(4m + 4), which has the limit 3/4 as m — oo. Similarly, if p 
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events are taken in succession from each trial, rather than the single events 
considered before, then we are led to consideration of the ratio 


m/2 


h 2 (1/2) + i/m}P [x > (i/m) , 


a ratio that tends to 


1 1 
/ arte | 2? dp =1—(1/2)?T! , 
1/2 0 


and that is “applied to determine the probability of a common cause among 
similar phenomena” [p. 6]. 

To evaluate the ratio of the sums given above the Euler-MacLaurin sum- 
mation formula in the form 


dhs [ teae 5 (oor Ceo C= fe ee 


may be used (here the Bo, are the Bernoulli Numbers — see Knopp [1990, 
§64]). Thus 


Y (i/my 


[PO Geimy aes 5+ F [2] 


Ba pe ee ee 


A! | m(m — 1)(m — 2) 
1 [mpPt} 
= — p+l 
a a + o(m ) 
Similarly 
m/2 m/2 


> (0/2) + G/m)P = — D [(m/2) + iP 


t=1 i=l 


—- {mr oe 2) Ses (me) } 


The ratio considered by Ellis then becomes, as m — oo, 


1 — 1/2?*! + o(1), 


in agreement with Ellis’s observation. 
Ellis concludes his essay with the following observations: 
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The principle on which the whole depends, is the necessity of 
recognizing the tendency of a series of trials towards regularity, 
as the basis of the theory of probabilities. I have also attempted 
to show that the estimates furnished by what is called the theory 
a postertors of the force of inductive results are illusory. [p. 6] 


8.15 Viktor Yakovlevitch Buniakovskii 
(1804-1889) 


Perhaps better known for his work in analysis and the theory of numbers, 
Buniakovskii wrote a number of papers, and one book, in which probabilis- 
tic methods were used. All of these are described in Sheynin [1991-1992], 
where a complete list of Buniakovskii’s writings on probability, both ap- 
plied and theoretical, may be found®®. 

An appendage to Buniakovskii [1846] was later published as a separate 
memoir (Buniakovskii [1850]). Here consideration was given to the following 
problem: 


La question que nous nous proposons de résoudre analytique- 
ment consiste donc a déterminer la probabilité que la perte en 
hommes ne dépassera pas certaines limites, fixées d’avance, ainsi 
que l’étendue de ces limites pour une probabilité dont on sera 
convenu du minimum. [1850, p. 235] 


Buniakovskii supposes that N men take part in an action in a battle. Of 
these, n are nominally chosen at a specific time, and of these in turn 7 are 
found to have been hors de combat at some time from the beginning of 
the action to the time of observation. If x denotes the probability that a 
specific soldier is put out of action, then the a prior: probability of the 
observed event is 


a 6 a'(l—a)"—*. (41) 
1 
Now x can take on any one of the values in the set 
{i/N, (i+ D/N,...,@+N—n)/N}, 


each value being equally probable. This leads to a sequence P;, Po, ..., 
Py—n+1 given by (41), and the probable number of soldiers injured is k = 
tN /n. 

Using Bayes’s formula, Buniakovskii next notes that the probability of 
the jth hypothesis (presumably after the observations have been made, 
though this is not stated) is 


N-n+1 


ie! 
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while 


la probabilité p de |’existence de l’une quelconque des hypothéses 
pour lesquelles le nombre total des individus mis hors de com- 
bat est compris entre les limites k — w et & +w inclusivement 


(1850, p. 237] 


is given by 
N-n+1 
Day a Wee, (42) 
7a) esl 


where a = k—w—i+l1 and 8 = k+w—i+1. This probability is interpreted 
verbally as 


la probabilité que le nombre réel d’individus mis hors de com- 
bat, sera compris, inclusivement, entre les limites k—w et k+w, 
w désignant un entier plus ou moins grand. [1850, p. 236] 


Defining 29, X, x2’ and x” by 


ae be en jw pew 
Lo = N ) X= a » b= N , ct = N ) 
Buniakovskii notes that (42) may be written as 


“i 
c 


x 
p= > #'(1— 2)" ‘> esa: (43) 


gaa! L=LO 


Ainsi, le rapport de ces deux sommes, prises chacune inclusive- 
ment entre les limites qui viennent d’étre designées, représentera 
la probabilité que, d’apres l’événement observe, le nombre d’indi- 
vidus mis hors de combat, sur une totalité N, est compris entre 
les limites k —w et k+w inclusivement. [1850, p. 238] 


The rest of the paper is taken up with an approximation to (43), it being 
shown that p is approximately equal to 


5 feta + K 
[1 — 13i(n — i) — n?]/[122(n — t)n] ” 


where 7’ and Kk are defined by 
— n/n w 
~ \/2i(n — i) N 
K waa 
Sa 
2Ni*(n — i)? —*\/27i(n — 2) 


[(a”)'(1 — gies i (z')'(1 _ ge) 
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In his Osnovantya Matematichkeskoi teoriu veroyatnostei (The Principles 
of the Mathematical Theory of Probability) of 1846, Buniakovskii considered 
the application of probabilistic methods to electoral results and to testi- 
monies. Not having access to the original, I am forced to rely on Sheynin 
[1991-1992] here. 

Suppose that there are s witnesses, the testimonies of each of whom have 
the same probability p (> 1/2). Of these witnesses, r assert that a certain 
fact occurred while q = s —r (with gq < r) assert that it did not. The 
probability that the first group tells the truth is then 


pp ep) 5 


a probability that is coincident with the probability of a unanimous state- 
ment for r—q witnesses. The case of s = 212 and r = 112 1s then equivalent, 
Buniakovskii noted, to that in which s = r = 12. (Another example may 
be found in Sheynin [1991-1992, pp. 208-209].) 

Laplace’s Constantinople example®’, suitably transformed, is repeated 
here. Buniakovskii supposes that two eye-witnesses declare that letters 
selected from the thirty-six-letter Russian alphabet make up the word 
Moskva. Suppose too that the two witnesses are equally trustworthy, with 
P. = po = 9/10, that the letters are drawn at random, and that the to- 
tal number of six-letter Russian words is 50,000. Then, by formula (41), 
p = 81/82, and the probability that an intelligible word is formed is 


50, 000/s6P.6 = 1/28, 048 . 


Generalizing the preceding formula to cover the case of witnesses that are 
not equally trustworthy, one has 


P = pip2/[pip2 + (1 — pi)(1 — pa)] 5 


and with p; = 81/82 and py = 1/28,048 Buniakovskii gets P = 1/347 as 
the probability of a reasonable word. 


8.16 William Fishburn Donkin (1814-1869) 


In 1851 Donkin?® published in three parts in the Philosophical Magazine, 
an article entitled “On certain questions relating to the theory of probabil- 
ities.” He begins by taking it as “generally admitted ... that the subject- 
matter of calculation in the mathematical theory of probabilities is quan- 
tity of belief” [p. 353], an observation that puts him squarely in the non- 
frequentist camp”?. 

The law on which the whole theory is based is stated to be the following: 


When several hypotheses are presented to our mind, which we 
believe to be mutually exclusive and exhaustive, but about 
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which we know nothing further, we distribute our belief equally 
amongst them. [p. 354] 


This being granted, the rest of the theory “follows as a deduction of the way 
in which we must distribute it in complex cases, if we would be consistent” 
(loc. cit.). Further evidence of Donkin’s subjective views, perhaps more 
in the style of Harold Jeffreys than Bruno de Finetti, is furnished by the 
observation that probability is 


always relative to a particular state of knowledge or ignorance; 
but it must be observed that it is absolute in the sense of not be- 
ing relative to any individual mind; since, the same information 
being presupposed, all minds ought to distribute their belief in 
the same way. [p. 355] 


Perhaps the most important result in the paper — certainly the most 
fundamental — is the following’?: 


Theorem — If there be any number of mutually exclusive hy- 
potheses, hy, ho,hs,..., of which the probabilities relative to 
a particular state of information are p), p2,p3,..., and if new 
information be gained which changes the probabilities of some 
of them, suppose of hm; and all that follow, without having 
otherwise any reference to the rest, then the probabilities of 
these latter have the same ratios to one another, after the new 
information, that they had before; that is 


Pi Pak Pgs 2? 2D = PE Pa OO Ding 


where the accented letters denote the values after the new in- 
formation has been acquired. [{p. 356] 


Whether this’? might not preferably be termed an aziom** is arguable: 


indeed, Donkin himself seems to suggest this, since he finds it “certainly as 
evident before as after any proof which can be given of it” [p. 356]. Boole, 
in his An Investigation of the Laws of Thought [1854], adds this result as 
an eighth principle to his list of similar fundamentals taken mainly from 
Laplace, and it can also be related to Burnside’s [1928, p. 4] modification of 
the usual “equally likely” definition of probability, in terms of which “each 
two of the n results are assumed to be equally likely” (emphasis added). 

A similar result had been given a few years earlier by de Morgan. In his 
Formal Logic of 1847 we read 


Again, if there be several events, which are not all that could 
have happened; and if, by a new arrangement (or by additional 
knowledge of old ones) we find that these several events are now 
made all that can happen, without alteration of their relative 
credibilities: their probabilities are found by the same rule. If 
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a, b, c, &c. be the probabilities of the several events, when not 
restricted to be the only ones: then, after the restriction, the 
probability of the first is a+(a+b+c+---), of the second, 
b+(a+6+---) and so on. [p. 190] 


An assumption of mutual exclusiveness, explicitly stated by Donkin, is 
needed here. 

In a recent paper Ramer has, perhaps unwittingly, considered Donkin’s 
Theorem. He supposes (Ramer [1990]) that one has a probability distribu- 
tion {p; = Pr[x;|}, 7 € {1,2,...,n}, with p; not identically zero on the set 
{21,22,...,2m}, for m < n. Then (see also Note 93 to this Section) the 
associated conditional probability is given by 


™m 
di = Pi S- pi : 
3 


Ramer (op. cit.) shows that this conditional distribution is that one whose 
distance from the original distribution is minimal. 

Donkin next considers as a specialization of his result the case in which 
the new information obtained is to the effect that some of the hypotheses 
must be rejected, or others admitted, or both. 

From these two theorems the following results ensue 


(a) Pr[H & h] = Pr{h | H] Pr[H] ; 


95 


(b) Pr [H; | A] = Pr [h | H;] Pr [Hi] pe Pr [h | H;] Pr (Hi; ; 


(c) If {H;}; are mutually exclusive and exhaustive and S; and S2 are 
two (independent) states of information, then 


Pr [H; | Sy, $9] ih [H; | Si] Pr [ A; | s4) / Ps [H; | Sy] Pr [H; | S| : 


where the H; are a priori equally likely and S; and S»2 are condition- 
ally independent given each H;; 


(d) extension of (c) to several independent sources of information. 


The last two of these results are an early contribution to the problem of 
the assessment of probabilities on different (and on combined) data. 

In the course of discussion of some miscellaneous examples illustrating 
the use of these theorems, Donkin distinguishes between a priori, provi- 
stonal and a posteriori probabilities. The first of these terms refers to 
“probabilities derived from information which we possess antecedently to 
the observation of the phenomenon considered” [p. 360], while the last is 
defined in the usual way. Provisional probability is illustrated as follows: 
suppose an approximate value po of p is assigned, with belief as to the 
precision of the approximation expressed by y(p), where 
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(i) y(p) is maximized by p = po; 


(ii) 5 (p) dp = 1; 


(iil) y(p) dp “is my belief that the true value would turn out to lie between 
p and p+ dp” [p. 362]. 


Then the provisional probability of p is just its expectation. 

As an illustration of the distinction necessary to be observed between 
provisional and a posteriori probability, Donkin considers the following 
problem: 


An event # has been observed, which can only have resulted 
from some one or other of the causes C',C”,... of which any 
one would necessarily produce it, and no two could coexist. It 
is required to assign the probability that it has resulted from 
C. [p. 362] 


If one now defines Pr[C | H] = a, Pr[(C | H] = 6, Pr[U'C | H] = a, 
Pr[U’C | H] = @, where H (respectively H ) denotes that a hypothesis H 
is true (respectively false), and U’C' denotes the existence of some cause 
other than C,, one can show, using the definition of FE, that 


pa + (1—p)b 

Pr[C | E,p| = Pr[C | E] = ————_————_. AA 
eels PCIE Fata)+(i-noray 

If our “provisional” value of p is 

1 

Ww = i pp(p) dp ; 
0 
then we may give the solution, from (44), as 

(kw + £)/(w +m) , (45) 


where k, £ and m are all known. However, one may also argue that 


g(p)(kp + £)/(p + m)] dp 


expresses the quantity of our belief that the value of p lies between p and 
p+ dp, and that C' caused F. Then the solution of our problem is 


i o(p)(kp + 8)/(p +m) dp. (46) 


The distinction between these two solutions is noted by Donkin thus: 


[(46)] expresses a real provisional solution; that is, it expresses 
our belief in the existence of C’, influenced by the consideration 
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that we do not possess a definitive knowledge of p. Whereas 
[(45)] expresses a solution obtained by treating a provisional 
value of p as if it were definitive, or it is what would be the 
definitive solution of the problem to a person whose state of 
information (antecedently to the event /) was such that w was 
to him the definitive a priori probability of H. [pp. 363-364] 


He asserts further that while (46) is right in principle, (45) is right in 
result, a claim that is justified on the following wise: denoting by P the 
random variable taking on the value p, we have (in a notation different to 
Donkin’s)%° 


Pr[p<P<p+t+dp& E]= [p(at+a)+(1—p)(d+ 2) y(p) dp 


1—p)(6 
Pr[p< P<p+dp|£]= peta ee y(p) dp 


Pr[p<P<ptdp&C|&)=Pr[p< P<pt+dp|E|Pr[C | £,pl. 
Therefore 


1 
/ Pr[p< P<p+dp&C | E|dp 


0 


Pr[C | E] 


(kw + £)/(w +m), 


which is the same as (45). 

This part of the paper concludes with an examination of the probabil- 
ities of whether some arrangement of chessmen on a board was produced 
by accident or design. | 

In the second part of his paper Donkin stresses the importance of keeping 
clear the distinction between a priort and a posteriori probabilities, illus- 
trating his remarks with the following application of the discrete Bayes’s 
Theorem: let p be the a priori probability of an event that a witness asserts 
has happened, and let v and w be the a priori probabilities that he chooses 
to assert it supposing it to be true or false respectively. Then the probabil- 
ity, after his assertion, that the event really happened is pv/[pv+(1—p)v]. 

This second part concludes with a discussion of the probability of the 
existence of binary stars: this is considered in §5.4 of the present tractate. 


8.17 George Boole (1815-1864) 


Although chiefly, and justifiably, remembered for his work in mathematical 
logic, Boole devoted considerable time to other branches of mathematics?” 
His work on probability is contained in the main in some dozen papers and 
in his book An Investigation of the Laws of Thought of 1854. 
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In the first of these papers, an essay among the Boole manuscripts in the 
Royal Society Library entitled “Sketch of a theory and method of proba- 
bilities founded upon the calculus of logic”, we find the statement of a 
general problem’® that was to be the chief object of Boole’s attention in 
his writings on probability, viz. 


Given the probabilities of any events, simple or compound, to 
ascertain the probability of any other event. [Boole 1952, p. 158] 


The solution presented here lacks the clarity evident in later writings, and 
we shall accordingly postpone any discussion of it for the time being. 

In 1851 Boole published a paper on Michell’s problem of the distribution 
of the fixed stars?’. This paper is considered in the context of that problem 
in §5.4 of the present work: it.is sufficient to note here that we find again the 
statement of the general problem (in terms of probabilities of propositions 
rather than events) [Boole 1952, p. 251]. This problem, and its solution, 
are further stated as follows: 


Given the probability (p) of the truth of the proposition, If the 
condition A is satisfied, the event B will not happen. Required 
the probability P of the proposition, If the event B does happen, 
the condition A has not been satisfied. The result which I obtain 
is 


P = c(1—a)/[ce(1— a) +a(1—p)], 


where c and a are arbitrary constants, whose interpretation 1s as 
follows: viz. a is the probability of the fulfilment of the condition 
A, c 1s the probability that the event B would happen if the 
condition A were not satisfied. [1851a, p. 528] 


In general, Boole’s solutions contain arbitrary constants, specification of 
which usually yields bounds within which the desired probability must lie. 

In another paper of 1851, “Further observations on the theory of prob- 
abilities”, the general problem and its solution of the previous paper are 
further discussed. Boole takes exception to Herschel’s doctrine that P = p 
(cf. Herschel (1857, p. 421]), an opinion that seems implicitly sanctioned by 
Laplace in his Essai philosophique sur les probabilités'°°, and also queries 
de Morgan’s choice of the constants a and c as 5 and 1 respectively!®!. 

In The Cambridge and Dublin Mathematical Journal, Vol. VI, November 


1851, Boole proposed the following question: 


If an event F’ can only happen as a consequence of some one 
or more of certain causes Aj, A2,...,An, and if generally c; 
represent the probability of the cause A;, and p; the probability 
that if the cause A; exist the event E will exist, then the series 
of values ¢1,C2,...,Cn, Pi, P2;--- , Pn, being given, required the 
probability of the event E’. [1851c, p. 286] 
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It should be noted here that the A; are not assumed mutually exclusive. 
This result, which has become known as Boole’s “challenge problem”, oc- 
casioned some discussion in mathematical circles!°?. 

The extension to n causes is given by Boole in An Investigation of The 
Laws of Thought as Problem VI of Chapter XX, and discussed by Keynes 
[1921, chap. XVII, §2). 

Perhaps the first to attack this problem was Arthur Cayley!°?, who in 
1853 considered the following special case: 


Given the probability a that a cause A will act, and the proba- 
bility p that A acting the effect will happen; also the probability 
2 that a cause B will act, and the probability g that B acting 
the effect will happen; required the total probability of the ef- 
fect. [p. 259] 


As an illustration of his formulation of the question Cayley suggests the 
following: 


say a day is called windy if there is at least w of wind, and a day 
is called rainy if there is at least r of rain, and a day is called 
stormy if there is at least W of wind, or if there is at least 
R of rain. The day may therefore be stormy because of there 
being at least W of wind, or because of there being at least A 
of rain, or on both accounts; but if there is less than W of wind 
and less than R of rain, the day will not be stormy. Then a 
is the probability that a day chosen at random will be windy, 
p the probability that a windy day chosen at random will be 
stormy, 2 the probability that a day chosen at random will be 
rainy, g the probability that a rainy day chosen at random will 
be stormy. [p. 259] 


Letting A (4) denote the probability that “the cause A” (B) acting will act 
efficaciously (or, in the illustration, letting A denote the probability “that 
a windy day chosen at random will be stormy by reason of the quantity 
of wind” (loc. cit.) with » being similarly defined with “wind” replaced by 
“rain” ), Cayley states that «4 and A can be determined from the equations 


p=At+(1—A)jwP , gq=ut(l—-p)Aa, ay) 
while the total probability p of the effect is given by 
p=rAa+ ps — Apap. (48) 


With a = 1 this system of equations yields p = p, a result Cayley seems to 
find reasonable. 

Cayley’s problem and its solution have been commented on by Boole, 
Wilbraham, Dedekind, Keynes and Hailperin, among others, and, even at 
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the risk of heaping Ossa upon Pelion, I shall give some comments of my 
own here before considering the remarks of other writers. 

Let us suppose first that W > w and R > r, as Cayley’s notation subtly 
suggests. Let S, denote the event that the day is stormy because of there 
being at least W of wind, and let Sp denote the event that the day is stormy 
because of there being at least R of rain. Then 


Pr(w] =a ; Pr{r] = 8 
Pr[S, V Sg|w] = p ; Pr[Si V Solr] = ¢ 
Pr[Si|w] =A ; Pr[Solr] =p. 


Then 
Pr[S; V So | w] 


Bo) 
i 


Pr[Si|w] + Pr[S2|w] — Pr[S) A S2|w] 


lI 


Pr{S;|w] + Pr[S2|w] — Pr[Si|w] Pr[Solw] 


if S; and Sy are independent given w. ‘Thus 


p = Pr{Sy|w]+ Pr[S2lw](1 — Pr[Si|w]) 


= A+(1—A)Pr[Solw] . (49) 
Now 
Pr[Sy|w] = Pr[SyAr|w] + Pr[So AF|w] 
= Pr[S> x r|w| , 


since the assumption that R > r entails that Sp and 7 are mutually exclu- 
sive. If we assume further that “rain” and “wind” are independent, then 


Pr[Sojw] = PriSo Ar 


i 


Pr[So|r] Pr[r] 
= pp. 
Substitution in (49) then yields 


p=A+(1—-A)ue, 


and similarly 
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An alternative argument, following Hailperin [1986, §6.2], may be given as 
follows: 


Pr[S, V So |w] 


{II 


Pp 


Pr[Sy{w] + Pr[ Sy, A So{w] 


Pr[S} |w] + Pr[ S,|w] Pr[S9| St A w] 


A+ (1 —A) Pr[So| $1 Aw). (50) 


If we assume that Sz and S; A w are independent (which loosely put says 
that “rain” and “wind” are independent) then 


Pr[S2|.S1 Aw] = Pr[S] = Pr[S2 Ar] = Pr[So|r] Pr[r] , 
and substitution in (50) yields 
p=A+(1—A)us , 


as before. 

Boole’s own solution to his “challenge problem” appeared in The Philo- 
sophical Magazine in 1854 (see Boole [1854b]) and in the same year in 
An Investigation of the Laws of Thought’°*. Commenting favourably on 
Cayley’s solution, Boole writes 


I have two or three times attempted to solve the problem by the 
same kind of reasoning, and have not approached so near the 
truth as Mr. Cayley has done. [1854b, p. 30] 


Finding Cayley’s solution to be incomplete, however, Boole eliminates A 
and yu from (47) and (48) to obtain 


Paap) a ult Co 1 ayi-a), (51) 


and this Boole finds to be wrong since the case p = 1, g = 0 yields 
u=a(l — £) 


and not u = a as should obtain. | 
The true solution, Boole claims, is to be found by equating the left-hand 


side of (51) to 
(u — ap)(u — 84) 
ap+ Bq —u 
and taking that root that satisfies 


max{ap, @q} <u < min{1 — a(1 — p),1— B(1 — q¢), apt fq} . 
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(This solution is arrived at after what Keynes [1921, chap. XVII, §2] de- 
scribes as “calculations of considerable length and great difficulty” .) 

Returning to the fray in 1862, Cayley first rephrases his question in a 
way that is more akin to Boole’s version, and then points out that the 
question may in fact be viewed in two ways, the solutions to which are 
different. His argument runs as follows (the quotation is long, but I think 
worthy of inclusion here): 


Considering only the causes A and B, the proposed question 
may be considered as being — 

“If the event E can only happen as a consequence of one or 
both of the causes A and B, and if a be the probability of the 
existence of the cause A, p the probability that, the cause A 
existing, the event F will (whether or not as a consequence of — 
A) happen; and in like manner if @ be the probability of the 
existence of the cause B, q the probability that, the cause B 
existing, the event will (whether or not as a consequence of 
B) happen: required the probability of the event EF.” 

This, which is strictly equivalent to Prof. Boole’s mode of stat- 
ing the question, may for convenience be called the Causation 
statement. But his solution, presently to be spoken of, is rather 
a solution of what may be termed the Concomitance statement 
of the question: viz., if for shortness we use AF to denote the 
compound event A and FE, so in other cases; and if we use also 
A’ to denote the non-occurrence of the event A, and so in other 
cases (of course (AE)’, which denotes the non-occurrence of the 
event AEF, must not be confounded with the event A’ E’, which 
would denote the non-occurrence of each of the events A, E), 
then the question is, 


“Given 
Prob. A/B'E, = 0, 
Prob. A . Se, 
Prob. AE , = a8, 
Prob. B > = 
Prob. BE , = fq; 


required the probability of F.” To show that the two statements 
are really distinct questions, it may be observed that when A 
and B both exist, then, according to the causation statement, 
they may one or each of them act efficiently, and & may thus 
happen as an effect of any one of them only, or as an effect of 
each of them; but, according to the concomitance statement, 
f cannot be attributed rather to one of the events A, B, than 
to the other of them, or to both of them. The solution which I 
gave in the year 1854 (Phil. Mag. vol. vii. p. 259) refers to the 
causation statement of the question, and assumes the indepen- 
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dence of the two causes*; and on this assumption I believe it to 
be correct. [pp. 352-353] 


Cayley now rehearses his former solution (giving it essentially in the form 
given by Dedekind, as we shall see subsequently — even to the conditions 
p < @q and gq ¢ ap), and notes its inconsistency with Boole’s. 

Boole replied to Cayley in the same issue, stating that 


I think that your solution is correct under conditions partly 
expressed and partly implied. The one to which you direct at- 
tention is the assumed independence of the causes denoted by 
A and B. Now! am not sure that I can state precisely what the 
others are; but one at least appears to me to be the assumed 
independence of the events of which the probabilities according 
to your hypothesis are aA, Gu. 

I think that every problem stated in the ‘causation’ form ad- 
mits, if capable of scientific treatment, of reduction to the ‘con- 
comitance’ form. I admit it would have been better, in stating 
my problem, not to have employed the word ‘cause’ at all. 
[pp. 361-362] 


Boole’s Laws of Thought was closely followed by a paper of 1854 by 
Henry Wilbraham, the avowed aim!° of which was 


to show that Professor Boole ... tacitly assume|[s] certain con- 
ditions expressible by algebraical equations, over and above the 
conditions expressed by the data of the problem, and to show 
how these assumed conditions may be algebraically expressed. 


[p. 465] 


The first tacit assumption Wilbraham finds Boole to have made is that of 
the independence of the several simple events involved; the second is the 
apparent supersedence of the assumed conditions by a new given condition 
— and with the addition of “assumptions made when no condition besides 
the absolute chances of the simple events is given” [Boole 1952, p. 475]. 
Wilbraham considers the following simplified form of the “challenge 
problem”, given by Boole in Chapter XX of The Laws of Thought: 


the probabilities of two causes A, and Ag are cy and cz respec- 
tively; the probability that if A; happen £& will happen is 7), 
that if Ag happen EF’ will happen is pg. F' cannot happen if nei- 
ther A, nor Az happen. Required the probability of EF. 

[1854, p. 471] 


“It is part of the assumption, that the causes do not combine to produce the 
effect: viz. if they both act, the effect is not produced unless one of them acts 
efficiently; they may or may not each of them act efficiently. 
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Denoting by € “the chance of A; and A both happening and being fol- 
lowed by E” (op. cit., p. 472), Wilbraham deduces, without making any 
assumptions, that 


u = Pr[E] = cp; + cope —€, 


where € < min{cy, cz}. 
Boole, Wilbraham declares, requires the following two assumptions for 
the validity of his solution: 


Prob. of Ai, Az, and E all happening _ Prob. Aj, not Ag, EF 
Prob. not Aj, A2, E ~ Prob. not A;, not Ag, not E ’ 
and 
Prob. A;, Ag, not # Prob. A;, not Az, not E 


Prob. not A;, A2, not £ ~ Prob. not A,, not Ag, not B ’ 


and while he considers the second to be not unreasonable, the first is viewed 
as “not only arbitrary but eminently anomalous” (op. cit. p. 478). 
Cayley’s assumptions are seen by Wilbraham to be the following: 


Prob. Aj, Az, not & Prob. Aj, not Az, not # 
Prob. not A,;,A9, not F Prob. not A;, not Ag, not E ’ 


and 
Prob. Aj, Ag Prob. Aj, not A» 


Prob. not A;,A2 Prob. not A;, not Ag, ’ 


i.e. A; and A» are independent 


first, in the case in which £ does not happen; secondly, in the 
case where it is not observed whether / does or does not hap- 


pen. [1854, p. 475] 


Finding controversy disagreeable, it was only with “the most unfeigned 
reluctance” that Boole replied [1854c] to Wilbraham’s comments!°°. He 
again notes the error in Cayley’s solution!®’, while 


On the other hand, I affirm without hesitation that there is no 
case 1n which the equations deduced by Mr. Wilbraham from 
my method of solution can be proved to be erroneous. They 
do not, indeed, represent “hypotheses,” but they are legitimate 
deductions from the general principles upon which that method 
is founded, and it is to those principles directly that attention 
ought to be directed. [1854c, p. 90] 


The next to comment on the problem was Richard Dedekind, who, in 
a paper published in 1855, defended Cayley against Boole, writing of the 
latter’s comments 
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Man sieht indessen durchaus nicht, wo Cayley einen Fehler 
gemacht hatte; und in der That ist seine Auflosung auch (bis 
auf gewisse Beschrankungen, durch welche sie erst eindeutig 
gemacht werden muf) streng richtig, selbst in dem eben ange- 
fuhrten Fall; denn man findet leicht, da a(1—) mit a iiberein- 
stimmt, indem a Nichts Anderes als Null sein kann. 

[1855, p. 269] 


From (47) and (48) Dedekind deduces! that 
p=(l-af+ap+ Bq—¢)/2, 
where ¢ is “die noch zweideutige” [p. 270] square-root to be found from 
(* = (1—- a8 + ap + Bq)” — 4(1 — B)ap — 4(1 — a) Bq — dep fiq 


Dedekind in fact concludes that the only necessary and sufficient condition 
for the solution of the problem is that both differences p— Gq and q—«ap are 
not negative. Specific attention is paid to the cases (discussed by Boole) in 
which gq = 0 or a = 0, and the agreement with the result obtained here is 
noted. 

In his Principles of the Algebra of Logic of 1879 Alexander MacFarlane 
provides a succinct discussion of Boole’s problem. His own work shows that 
“the probability required cannot be determined exactly from the data” 
[p. 154] and also allows the ready determination of the relations that exist 
among the data. Mention is also made of the solutions given by Cayley and 
Wilbraham, and MacFarlane concludes by noting that 


What is given by Boole’s solution is not the mathematical prob- 
ability of the event E’, but the most probable value of the prob- 
ability which can be deduced from the given data. [p. 155] 


In the fourth of a series of papers on the calculus of equivalent statements, 
Hugh MacColl! obtains essentially Wilbraham’s solution 


Pr[E] = cipi + cape —€E , 
where now (52) 
E = Pr{A; Ap] Pr[E|A; Ao] . 


If A; and A» are assumed to be independent, and if # is assumed to be 
more probable when both A, and Ag exist than when only one of them 
exists, then 


Pr[A,A2] = Pr[ Ay] Pr[A2] and Pr[#|A;A2] > max{Pr[E|A,], Pr[E'|A2]} . 
It thus follows that 


C1pit+cep2—c1¢c2 < Pr[E] < min{cyp;+c2p2—c¢1cepi, cipi+ce2p2—Cicep2} . 
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By a numerical example, MacColl shows that Boole’s solution is wrong: 
Hailperin [1986, p. 367] concludes though that MacColl had changed the 
problem “by having additional conditions which should be included in the 
data”, and the limits, including these conditions, found by Hailperin using 
Boole’s technique, do not in fact exhibit a flaw in the latter’s argument. 

MacColl also gives “a very simple proof of the fundamental rule in the 
Inverse Method of Probability” [p. 120] (the method, though not the nota- 
tion, is that usually used today to prove the discrete Bayes’s rule). 

In his sixth paper MacColl states that Boole’s “General Method” is basi- 
cally flawed!!", “as it professes to obtain exact results from data which are 
demonstrably insufficient” [1897, p. 562]. Once again he quotes the prob- 
lem, with the solution (notation altered) as given in (52). He then points 
out that any one of the three following assumptions may be made: 


L. Pr[Z|A; A Ae] = 0; 
2. A, and Ag are independent, and Pri EA; A Ag] = 1; 
3. A, and A» are independent, and 
Pr{E|Ay A Ao] > min{Pr[E|A,], Pr[#|Ag]} . 


Thus the required chance varies on different hypotheses, each 
of which is consistent with the data of the problem; and, as the 
respective chances of the truth of these hypotheses are wholly 
unknown to us, we cannot infer that the required chance has a 
fixed or constant value calculable from the data. [p. 563] 


The problem, MacColl believes, lies in Boole’s definition of independence, 
and he quotes the following passages from The Laws of Thought: 


Two events are said to be independent when the probability of 
the happening of either of them is unaffected by our expectation 
of the occurrence or failure of the other. [p. 255] 

When the probabilities of events are given, but all information 
respecting their dependence withheld, the mind regards them 
as independent. [p. 256] 


Keynes [1921, p. 167] regards the first of these definitions as correct, but 
finds it to be inconsistent with later developments (see also The Laws of 
Thought, p. 258), from which, for instance, it seems to follow that if xz is 
a possible event, then x and z are to be taken as independent!!!. 

Keynes [1921] considers Boole’s “challenge problem” in some detail. He 
finds Boole’s solution to be wrong!!”, the correct answer! in fact being 


u = (cip; + cop2)/(1 + z) 


or 
u=(cipi + cape) —y, 
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where z = Pr[Ay A A2|E AH], y = Pr[A; A A2 A E|H], and where the other 
probabilities are assumed to be similarly conditioned on H, the data of the 
problem. Keynes also deduces bounds for u that are independent of y and 
z, and are identical with those given by Boole for the roots of his equation. 

Many, if not most, of Boole’s writings on probability are concerned with 
his general problem and various developments thereof. One paper, of 1857, 
for which he was awarded the Keith Prize, is concerned with the probabil- 
ities of testimonies: this paper Keynes [1921, chap. XVI, §6] considers to 
be Boole’s “most considered contribution to probability”. 

We now pass on to An Investigation of the Laws of Thought on which 
are founded the mathematical theories of logic and probabilities, published 
in 1854. Eschewing, albeit with difficulty, any discussion of the complete 
work'!*, we note merely that among a list of principles “chiefly taken from 
Laplace” [p. 248], we find the following: 


6th. If an observed event can only result from some one of n 
different causes which are a priori equally probable, the proba- 
bility of any one of the causes is a fraction whose numerator is 
the probability of the event, on the hypothesis of the existence 
of that cause, and whose denominator is the sum of the similar 
probabilities relative to all the causes. [p. 249] 


This is clearly an inverse probability principle of the usual form. However 
Boole goes on to say 


the data are the probabilities of a series of compound events, 
expressed by conditional propositions [p. 250] 


and some confusion between the probability of a conditional, Pr[A — B], 
and a conditional probability, Pr[B|A], seems apparent. 

Like de Morgan (see §8.4), however, Boole usually (with an exception 
to be mentioned below) evaluates the probability of a “conditional propo- 
sition” as a ratio of (absolute) probabilities. The single exception occurs 
in his fifth example in Chapter XVIII, §7, the problem discussed there 
reducing essentially to 


Given Pr[A|B]=p, PrlBlC)=¢, 
Find PrlA|C]. 
The solution given is 
Pr[A|C] = pq t+ a(1—q), 
where 


the arbitrary constant a is the probability that if the proposition 
Z is true and Y false, X is true. [p. 285] 
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(Here the 


Major premiss: If the proposition Y is true X is true. 
Minor premiss: If the proposition Z 1s true Y is true. 
Conclusion: If the proposition Z is true X is true. 


correspond to our A, B and C respectively.) 
Boole’s further discussion supposes (in essence) that if C' obtains, then 


Pr[B] =U s Pr[A A B] = pg , 


or that 
Pr BIC) a7~. PrlAAN BIC) = pe. 


Combining this with what is originally given, we see that 
Pr[A A B\C] = Pr[A|C] Pr[ BIC] . 
Since, however, it 1s generally true that 
Pr[A A B|C] = Pr[A|B AC] Pr[BIC] , 
we see that Boole is assuming that 
Pr[A|B AC] = Pr[A|B] ; 
and under this assumption it is easy to deduce his result: indeed 


Pr[A|C] 


tI 


Pr[A A BIC] + Pr[A A BIC] 
= Pr[A]B AC] Pr[B|C]+ Pr[A| BA C] Pr[ BC} 
= Pr{A|BAC]q+a(1—-q) 


= pqt+a(l—q), 


with a as specified before. ‘The confusion mentioned earlier arises from 
Boole’s equation of things like Pr|X — Y] with things like Pr[Y |X]. 

We turn our attention next to the twentieth chapter!!*, entitled “Prob- 
lems relating to the connexion of causes and effects.” Several of the prob- 
lems discussed here are developments of those already discussed in this 
monograph, and we shall therefore pass on directly to the ninth problem. 
This is phrased as follows: 


Assuming the data of any of the previous problems, let it be 
required to determine the probability that if the event F present 
itself, it will be associated with the particular cause A,; in other 
words, to determine the a posteriort probability of the cause A, 
when the event F has been observed to occur. [pp. 356-357] 
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As in Boole’s preceding problems let us set 
Pr[A,] = Cj, Pr[E| Aj] = pj, tE ee ee ,n} : 


Then a simple application of Bayes’s Rule gives 


Pr[Ai|E] = cipi /> CiPj 
1 


as Boole states!?°. 
More relevant, however, is Problem X, one that Boole describes as “of a 
much easier description than the previous ones” [p. 358]. This runs!!7 


The probability of the occurrence of a certain natural phe- 
nomenon under given circumstances is p. Observation has also 
recorded a probability a of the existence of a permanent cause 
of that phenomenon, i.e. of a cause which would always pro- 
duce the event under the circumstances supposed. What is the 
probability that if the phaenomenon is observed to occur n times 
in succession under the given circumstances, it will occur the 
n+ 1% time? What also is the probability, after such observa- 
tion, of the existence of the permanent cause referred to? 


[p. 358] 


Boole provides two methods of solution to the first question. The first of 
these is complicated: the second, attributed to Donkin!!®, runs as follows: 
let Pr[E] = p, Pr[C] = a and Pr[E | C] =z. Then p=a+(1—a)z, and 
hence z« = (p—a)/(1—a). The a priori probability of the occurrence of the 
event n times being 1 (if C exists) or x” (if C obtains) we havel!® 


Pr(C'|a,...,¢@n]=a/[at+(1—a)z"] 


Pr[C|21,...,2n] = (1—a)2” /[a + (1 - a)x”). 
Hence the probability of another occurrence is 
{a/[a+(1—a)ze"}}14+{(-a)z"/[a+(1—a)zr™]} 2. 
On replacing x by its value (p — a)/(1 — a) we obtain the result 


a+ (pay /(1—a)| / a+ (p-ayr/(1—a)"")], 


the solution to the second question being a divided by the above denomi- 
nator. 

Keynes [1921, §X VII.2] shows that common techniques allow the (com- 
paratively) easy derivation of Boole’s results under the adjoining of the 
following condition to the assumptions stated: 


Prlz;|a1,22,... ,%;-1,C] = Pr{zi|C]: 
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a somewhat simpler form of Keynes’s derivation may be found in Hailperin 
[1986, pp. 406-407]. 

Boole now proceeds to consider the usual mode of approach to such 
problems, whereby the “necessary arbitrariness of the solution” [p. 368] is 
evaded!?°, This is exemplified by the case of the sun’s rising!?!: let p be an 
unknown probability and c (infinitesimal and constant) be the probability 
that the probability of the sun’s rising lies between p and p+dp. Then the 
probability that the sun will rise m times in succession is 


1 
| p’” dp , 
0) 


and hence the probability of one further rise, given m rises in succession, 
is 


ef amt ap [ec fp dp =(m-+ 1m 42), 


Boole however rejects the principle “of the equal distribution of our 
knowledge, or rather of our ignorance” [p. 370], on account of its arbi- 
trary nature!2?. He notes that different hypotheses may lead to the same 
result!?°, while other hypotheses, as strictly involving this principle, may 
conduct to other conflicting conclusions. As an illustration of the latter 
possibility Boole considers the drawing of balls from a bag containing an 
infinite number of black or white balls, under the assumption that “all pos- 
sible constitutions of the system of balls are equally probable” [p. 370]. We 
seek the probability of getting a white ball on the (m+1)th drawing given 
that the m previous draws all yielded white balls. 

This problem Boole solves in two ways: the first (and shorter) of these 
relies on his logical approach to probability!?*, while the second proceeds 
in the more usual style as follows: suppose initially that the urn contains 
jt balls and that sampling proceeds with replacement, all constitutions of 
the system being a priori equally likely. Then the probability of obtaining 
r white and p—r black balls in p drawings, irrespective of order and under 
the assumption that the urn contains n white balls, is 


Ga) a) 


Since the probability that exactly n balls are white is (“) /2", (the num- 
ber of possible constitutions of the system being 2”), it follows that the 
(unconditional) probability of obtaining r white balls is 


ZOO G) 0-7" 


Using the Heaviside D operator and the Newton Series Boole shows that, 
for large values of yz, this probability reduces to (?) / 2?. On our setting 
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p =r =m, the probability that the (m+ 1)th drawing will yield a white 
ball after the first m draws have yielded white, is found to be 7 

An easier verification of the limit (?) /2? than that advanced by Boole, 
though one that requires results that were unavailable when he wrote, is 
provided by using $.N. Bernstein’s version of the Weierstraf Approximation 
Theorem'*°. To this end, let f be a function on [0,1] and consider the 


Bernstein polynomial of degree n associated with f and defined by 
By (a) So f(k/n) (jena —2)"-* nen. 
k=0 k 


If f is continuous, then B,(z) converges uniformly to f(z) on [0,1]. In 


Boole’s notation, set 
r p-r 
n n 
(n/ 1) , 7 


ayted=S0(5) (1-3) (feta 


n=O 


Then 
converges uniformly on [0, 1] to 


On setting « = 1/2, we get 
b r por ie 
ji n n l 
= — —_— “s 
ree du, (") 8 ( 5) a | 


with (uniform) convergence to f(1/2) = 1/2?. Thus Boole’s sum 


EN) 0-2) /e 
= (7) 3.0/2) 


converges uniformly on [0,1] to (?) / 2°. 

In Chapter XXI the general method discussed earlier in the work is 
applied to the question of the probability of judgements!*°. Perhaps all 
that need be said here is to repeat Boole’s statement that “It is apparent 
that the whole inquiry is of a very speculative character” [p. 379]. 
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8.18 Charles Hughes Terrot (1790-1872) 


In 1853 Terrot!?’ published a paper under the title “Summation of a com- 
pound series, and its application to a problem in probabilities.” It is the 
application that is of particular interest here, concerning as it does the rule 
of succession. 

The series referred to in the title of the paper may be written 


"SS (ma Dole tH 7 Pat uaa 7) 
i= i 
= plq'! (eae 


using an identity from Feller {1957] '. Having established this result, Ter- 
rot turns his attention in the second section of his paper to the following 
problem: 


Suppose an experiment concerning whose inherent probability 
of success we know nothing, has been made p+ q times, and has 
succeeded p times, and failed q times, what is the probability 


of success on the p+ q+ 1’ trial. [p. 542] 


To realize this problem Terrot considers the case of a bag containing m 
balls, all either black or white, but in unknown proportions!?. From this 
bag p white and q black balls have been drawn. Then the following four 
cases present themselves [Terrot 1853, p. 543]: 


1. m may be given, and the balls drawn may have been replaced in the 
bag; 

2. mmay be given, and the balls drawn not replaced; 

3. m may be infinite or indefinite, and the balls replaced; 

4. m may be infinite or indefinite, and the balls not replaced. 


In this paper Terrot solves the second case (in which the fourth is sub- 
sumed) and makes an attempt at the first case (the third has the well- 
known solution (p + 1)/(p+q+ 2)). 

Denoting by £ the observed event, and by H; the hypothesis that the 
bag contains initially (m — q — 12) white and (q+ 7) black balls, with z in 
the set {0,1,...,m—q—p-—1}, we have 


Pr[E | Hi] = p!q' (” 3 ~ ; (" : ‘ / (mea 5 


"Recall that (x), = r(x —1)...(2 —n +1). 


394 8 Poisson to Whitworth 


where order is taken into account!??. Under the assumption that all (pos- 
sible) initial compositions of the bag are equally probable, we have, by an 
application of a discrete form of Bayes’s Theorem, - 


pr [ms | B= ("2 ') vai en Pe 


Since 
Pr [white ball drawn | FE & H;] = (m— p—q-—i)/(m—-—p-—q), 
it follows that 


Pr {white ball drawn & H; | E] = Pr{white ball drawn | EF & H;] Pr|H; | £] 
(m—p—q—i) (m—q—i\ (gti) /SS" (m—a-3)\ (a4 
reg aed Gu ae aa 
J= 


imcpeae (mas) (ett) /(tt) 


(m—p—q) p i ptqt+l 


Thus 
Pr [white ball drawn | £] 


OE ete CT O/C) 


4120 
| EEL mea) et) / (mth) 
“ape ee p+ y pt+qt+l 


—~ «al any Gian 
~  m—p—q \p+qt+2 ptqtl 


= (pe 1)/ (pt oF 2). 


This, the solution to Terrot’s second case, being independent of m, is clearly 
also the answer to the fourth case. 

Terrot now turns his attention to the first case, noting firstly that the 
main object here is the summation of the series 


(m—1)P x 194+ (m— 2)? x 274---+ 1? x (m—1)!. 


He discusses in detail the specific case p = 2, gq = 3: we shall give a more 
general discussion. 
Suppose, then, that (p+ q) draws (with replacement) from m balls have 
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resulted in p white and q black balls (event /). If there are r white balls 
in the bag, the probability of F is 


paq\ 7? \P r\! 
( ‘ )(4) ee 
while the probability that one further draw results in a white ball is r/m. 
Thus!°° 


Pr [white ball drawn | E] = Llo/myrr 1—r/m) = (1 —r/m)! 


= (afm) 3 v(m = ry8 / $28 m = 2) 


Having obtained this result, Terrot finally points out that in the lhmit 
as m tends to infinity this result approaches (p + 1)/(p + q + 2), as is of 
course expected (see §§8.14 and 8.22 for details of the limiting process). 
This observation concludes the paper. 


8.19 Anton Meyer (1802-1857) 


In 1856 Meyer!?! published a paper entitled “Note sur le théoréme inverse 
de Bernoulli” , in which he noted, in addition to the theorem mentioned in 
the title, the results of Bayes and Laplace. His own note was devoted to the 
direct proof, given by Laplace, of this inverse Bernoulli theorem, and the 
main result runs as follows: let x; and x2 be the unknown probabilities of 
two complementary events A, and Ag. If, in a large number pp = m,+mz of 
trials, A; and Ag occur m, and mz, times respectively, then the probability 
that x, lies within the limits 


m 2( 2 — m,)m, 
LEO ied Cases ; ) 
Lt pb 


2 q 2 
Pas foe 
Vt Jo 


to terms of order 1/y. We shall not pause to discuss this result here, but 
shall pass on immediately to a longer work. 

Meyer’s fssaiz sur une Exposition nouvelle de la Théorie analytique des 
Probabthtés a posterior: appeared in 1857. His avowed aim in writing this 
monograph is expressed in the foreword as follows: 


will be 


en écrivant cet essai, j’ai eu primitivement en vue la nécessité de 
rendre plus rigoureux les calculs, et de concentrer les méthodes 
et les principes dans |’exposition de la théorie des probabilités 
a posteriori. 
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Whether he was altogether successful in attaining this goal will become 
clear as we discuss that part of this work that is pertinent to our purpose. 

The second part of this Hssaz is entitled “Théoremes de Bayes et de 
Laplace sur la probabilité des causes.” Here Meyer discusses, in addition to 
the two results mentioned in the title, theorems by Bernoulli and Poisson 
and an inverse Bernoulli theorem. We shall discuss these results seriatim. 

Denoting by y = f(x) the probability of an event depending upon the 
unknown z (where z is called the “cause” of that event), Meyer states as 
a theorem due to Bayes the following result: 


cs - désignant les limites de toutes les valeurs possibles de 


xz,siy = fz est la probabilité de l’une quelconque des valeurs de 
x, regardée comme certaine, je dis que l’on aura une probabilité 


B b 
Poy vie | | ydx , 


que l’inconnue x est comprise dans les limites a et @. [p .19] 


Now it seems rather curious to attribute this result, in which no mention 
is made of the number of occurrences or failures of the event, to Bayes. In 
fact, the expression given seems to be only Prla< X < Bla< X < ) 
where X is a random variable with probability density function f. 

Two corollaries to this result are given. The first states 


La probabilité p d’une valeur unique de x est par conséquent 


exprimée par 
b 
PeSyar /| ydz , 
[p. 20] 


which is just Pr[x << X <«2+dzr|a< X < 6]. The second corollary runs: 


Soit z = yx la probabilité d’un évenement futur, due a la cause 
xz, et y = fx la probabilité d’un évenement observé, soit P; la 
probabilité de l’évenement futur en vertu de la cause dont la 
probabilité est la valeur p ci-dessus, nous aurons évidemment 


b 
Pix pe= ends / | y dz 


Donc si 7 exprime la probabilité que l’évenement futur arrivera 


en vertu de |’une des causes z = , hous aurons 


B b 
r= | svie | | ydz . 


[pp. 20-21] 
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This corollary is recognizable as an extension of Meyer’s first theorem in 
the same way that Price’s result extended Bayes’s. 

The second theorem, attributed to Laplace, that Meyer proves is the 
following: 


zg étant la cause inconnue d’un événement composé, dont la 
probabilité est 

y= (fe), 
si m désigne la valeur de x qui rend y un maximum, je dis qu’en 
supposant s tres-grand, on aura, aux quantités prés de l’ordre 
1/s, une probabilité 


2 ny 2 
P=— | e' dr 
Vm Jo 


que l’inconnue, ou la cause x, est comprise entre les limites 


+ eae: ae = m + es eee : 
hae d? log y 
=o 2fx m - 2dx2 


[p. 21] 


(Here “log” denotes the natural logarithm.) The proof given (which makes 
use of Meyer’s version of Bayes’s Theorem) is long and involved, and will 
not be presented here. The result, however, seems correct!**. The proof is 
succeeded by the following three remarks: 


(i) if P remains constant, the limits contract as s increases; 

(ii) the limits remaining constant, which requires that y increases as s 
increases, the probability P tends to 1 as s — oo; 

(iii) by increasing s one may therefore contract the limits and simul- 
taneously increase P: for s = co, we have x = mand P=1. 


Meyer is not reluctant to blow his own trumpet: before stating his second 
theorem he writes 


quoique mes déductions procédent au fond des idées de Laplace, 
elles sont a la fois plus claires et plus rigoureuses que celles de 
cet auteur. [p. 21] 


The third result cited is the inverse Bernoulli theorem!?%, viz. 


xz et 1 — x désignant les probabilités simples et inconnues de 
deux évenements contraires A et B, en supposant que A arrive 
p fois, et B q fois en un tres-grand nombre pz = p+q d’épreuves, 
je dis qu’on aura la probabilité 


2 es 2 
pee e" dr 
Jt Jo 
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que xz est compris entre 
[2 
P + ee haladed [sic] 
BH MV pf 
[p. 28]. 


Notice here that the probability x is supposed unknown, in contrast to its 
appearance in a “known” capacity in the (direct) Bernoulli Theorem. Once 
again Meyer makes use of his first theorem in the proof, and indeed this 
result appears essentially as a special case of the second theorem. 

The fourth theorem is attributed to Bernoulli, and is stated as follows: 


z,et 1—2z étant les probabilités simples, supposées constantes 
et connues des événements contraires A et B, le rapport m/s 
du nombre de fois m que A arrivera le plus probablement en 
un trés-grand nombre s d’épreuves, a ce nombre s, est, aux 
quantités prés de l’ordre 1/s, compris entre les limites 


te ave — zx) 


avec une probabilité 
ad 


7 = fet ats ply, 
JT 0 2asz(1— zx) 
[p. 30] 


The proof of this result is unexceptionable: the theorem however is in fact 
not that given by Bernoulli — indeed Meyer’s statement owes far more to 
de Moivre than to Bernoulli!**. 

Finally Meyer discusses Poisson’s theorem, which differs from Bernoull1’s 
result in as much as the probabilities of the individual events are no longer 
required to be the same. 

One might perhaps summarize this section of the monograph by say- 
ing that, while Meyer provides useful and accurate proofs of the theorems 
stated, he is somewhat less than careful in his eponymy. 


8.20 Albert Wild 


In 1862, in a work entitled “Die Grundsatze der Wahrscheinlichkeits-Rech- 
nung und ihre Anwendung”, Wild quotes Bayes on the probability of causes: 
the reference appears in connexion with a simple discrete form of Bayes’s 
Theorem, but Wild does not attribute this result to Bayes. He passes on, in 
the section on “Die Wahrscheinlichkeit die Naturereignisse” to the formula 


aera ayrar / fem 2yrae 
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and then gives the rule of succession. The extended form to r and s future 
occurrences of events of two (only possible) types is discussed. Finally we 
find Bayes’s result 


[ 2™(1— x)” / ff e™(1— 2)" dz 


and the limiting form 


2 . 2 
— et dt. 
Al 


8.21 John Venn (1834-1923) 


From one who was primarily a philosopher rather than a mathematician!*® 


one might be surprised to find statistical work emanating!*®. Yet in his 
book The Logic of Chance!*", first published in 1866, Venn strongly advo- 
cated the frequency concept of probability on which so much of “classical” 
statistics depends — a concept based on a series that “combines individual 
irregularity with aggregate regularity” [Venn 1962, p. 4]'°8. 

In the fourth chapter of his book, in which he considers modes of estab- 
lishing certain properties of these series, Venn discusses (i) the meaning to 
be attached to the phrase “equally likely” and (ii) the Principle of Sufficient 
Reason, a rule in which he finds 


very great doubts whether a contradiction is not involved when 
we attempt to extract results from it. [p. 82] 


In Chapter VI, entitled “The subjective side of probability. Measurement 
of belief” , Venn expresses the views of de Morgan and Donkin (according to 
which views probability is defined with reference to belief), exposes various 
difficulties that arise in trying to assimilate these views, and reiterates his 
opinion that 


all which Probability discusses is the statistical frequency of 
events, or, if we prefer so to put it, the quantity of belief with 
which any one of these events should be individually regarded, 
but leaves all the subsequent conduct dependent upon that fre- 
quency, or that belief, to the choice of the agents. [p. 137] 


Furthermore 


The subjective side of Probability therefore, though very in- 
teresting and well deserving of examination, seems a mere ap- 
pendage of the objective, and affords in itself no safe ground 
for a science of inference. [p. 138] 
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In Chapter VII Venn turns his attention to inverse probability, a concept 
that he had defined in an earlier chapter as “the determination of the nature 
of a cause from the nature of the observed effect” [p. 109]. Arguing that 
the distinction between direct and inverse probability should be abandoned, 
Venn illustrates his point with the usual sort of “balls and bag” examples, 
and concludes that any such distinction either vanishes or!*? 


merely resolves itself into one of tzme, which, ... is entirely 
foreign to our subject. [p. 185] 


A ground for rejecting the inverse argument is the use of the entirely arbi- 
trary “equally likely” assumption. 

Venn now turns his attention to the rule of succession a term intro- 
duced by him himself), his eighth chapter!*! containing what Jaynes has 
described as!*? 


140 ( 


an attack on Laplace’s rule of succession, so viciously unfair that 
even Fisher (1956) was impelled to come to Laplace’s defense 
on this issue. [1976, p. 242] 


This rule, says Venn, is generally stated as follows: 


“To find the chance of the recurrence of an event already ob- 
served, divide the number of times the event has been observed, 
increased by one, by the same number increased by two.” 


[p. 196] 


He states, without proof, the customary result (m+ 1)/(m + 2) for a 
“balls and bag” example, and goes on to say that 


Then comes in the physical assumption that the universe may 
be likened to such a bag as the above, in the sense that the 
above rule may be applied to solve this question:— an event 
has been observed to happen m times in a certain way, find the 
chance that it will happen in that way next time [p. 197], 


illustrating this with examples from Laplace and de Morgan. Venn con- 
cludes that “It is hard to take such a rule as this seriously” [p. 197]**°. 

Venn returns to the subject of inverse probability in his tenth chap- 
ter, pointing out the needments for deciding whether an event has been 
produced by chance or by design, i.e. 


(1) The relative frequency of the two classes of agencies, viz. 
that which is to act in a chance way and that which is to act 
designedly. (2) The probability that each of these agencies, if 
it were the really operative one, would produce the event in 
question. [p. 249] 
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While the probability instanced in the second case is generally readily ob- 
tainable, the frequencies needed in (1) present a severe problem to an ad- 
herent to the frequency theory of probability, but Venn concludes that such 
problems “are at least intelligible even if they are not always resolvable” 
[p. 258]. 

Like so many writers Venn devotes some thought (see his Chapters XVI 
and XVII) to the application of probability to testimony: his conclusion is 
that such problems ought not to be considered as questions in probability, 
a decision that is perhaps understandable in the light of a frequentist flame. 

Venn’s work on probability did not pass without comment. Thus Edge- 
worth [1884b], while agreeing in the main with Venn’s objective approach, 
suggested that the latter’s 


logical scepticism has often carried him too far from the position 
held by the majority of previous writers upon Chance. [p. 224] 


Pearson [1920a] draws attention to Venn’s criticism of inverse probabilities, 
a criticism apparently based on an “objection to the principle of equal 
distribution of ignorance” [p. 2], and one that Pearson finds curious in the 
light of Venn’s approach to the problem of the effect of Lister’s method. 
This argument receives further attention in the first appendix to Pearson’s 
paper of 1928, while more recently Jaynes has pointed out a curiosity in 
Venn’s thinking, viz. 


How is it possible for one human mind to reject Laplace’s rule 
of succession; and then advocate a frequency definition of prob- 
ability? Anybody who assigns a probability to an event equal 
to its observed frequency in many trials, is doing just what 
Laplace’s rule tells him to do. [1976, p. 242] 


Support for Venn’s approach was given by Fisher [1922], who, regarding 
inverse probability as a “fundamental paradox”, paid tribute to the crit- 
icisms of Boole, Venn and Chrystal, as having “done something towards 
banishing the method, at least from the elementary text-books of Algebra” 
[p. 311]. He also comments on the “decisive criticism” of these three au- 
thors of “the baseless character of the assumptions made under the titles 
of inverse probability and Bayes’ Theorem” [p. 326]. Fisher’s remarks, i 
turn, have been critically examined by Zabell [1989a]. 7 


8.22 William Allen Whitworth (1840-1905) 


Although known in his lifetime as a writer on religious and mathematical 
topics, Whitworth is perhaps remembered today mainly as an inveterate 
setter and solver of exercises and questions in probability. The lectures he 
delivered to women at Queen’s College, Liverpool, in 1866 were clearly and 
carefully elaborated into Choice and Chance, Two Chapters of Arithmetic; 
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with an appendiz containing the algebraical treatment of permutations and 
combinations newly set forth, a book that first appeared in 1867 and that 
grew considerably in size over subsequent editions!**. Solutions of many 
of the exercises were given in his DCC' Exercises, Including Hints for the 
Solution of All the Questions in Choice and Chance of 1897. 

We have discussed Whitworth’s contribution to the solution of a problem 
on Lister’s method in §9.7; here we shall restrict our attention to those 
exercises in the fifth (and last) edition of Choice and Chance that deal 
with our topic!*, 

Like Bayes, Whitworth states that he will regard chance and probability 
as synonymous, and he also stresses that all probability is conditional, it 
always being dependent on the degree of one’s ignorance. The following 
passage from his DC'C' Exercises 1s worth noting: 


Chance has to do altogether with what we have reason to ex- 
pect. It therefore depends upon our knowledge or upon our ig- 
norance. It is a function of our knowledge, but that necessarily 
a limited and imperfect knowledge. This is a point which both 
Dr Venn and Prof. Chrystal appear to me to miss. [p. xxii] 


Basic to many of his solutions is 
RULE IX. 


If a doubtful event can happen in a number of different ways, 
any accession of knowledge concerning the event which changes 
the probability of its happening will change, in the same ratio, 
the probability of any particular way of its happening. [p. 162] 


Now let us turn to the pertinent exercises!*®. 


Question 134. A bag contains five balls, which are known to 
be either all black or all white — and both these are equally 
probable. A white ball is dropped into the bag, and then a ball 
is drawn out at random and found to be white. What is now 
the chance that the original balls were all white? [p. 164] 


Letting H, and H» denote the initial compositions (B, B, B, B, B) and 
(W, W, W, W, W) respectively, we have 


Pr[A] = § = Pr[AQ] , 


the addition of the white ball not affecting these chances. Let Hj and Hy 
denote the possible compositions after the addition of the white ball. Then 
Pr[H2|Wa] = Pr[H5|Wa]j, and hence 


Pr[H2|Wa] = Pr[WalH9]/{Pr[Wa|H}] + Pr[WalH5]} 


1 _ 6 
(1+1/6) 7? 
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where Wg denotes the drawing of a white ball. 
Whitworth’s solution is long and drawn out, being related to his Rule 
IX as follows: 


The 4 priori probability that all [balls] are white is 5, and then 
the chance of drawing a white ball is 1 (or certainty). Hence the 
chance of the event happening in this way is 5 x 1, or 5. 

So the @ priort probability that the first five were black is 
7 and then the chance of drawing a white ball is z. Hence the 
chance of the event happening is this way is ‘ x z, or on 
[p. 164] 


The total a priori chance of the happening of the event is then 5: the 
drawing of a white ball increasing this to 1, 1.e. the chance is increased in 
the ratio 7 : 12. The chances of the event’s happening in the two different 
ways are increased in the same ratio (by Rule IX), and the a posteriori 


chances of the event’s happening in these two ways are i x 44 = © and 
Pp Aa 2 7 7 


iy x 2 = F respectively. 

The solutions of the other questions under consideration here are given 
similarly by Whitworth; we shall present them using a more modern nota- 
tion in which the use of the discrete Bayes’s Theorem is clearer than it is 


in the original. 


Question 135. In a parcel of 1000 dice there is one that has 
every face marked six: all the rest are correctly marked. A die 
taken at random out of the parcel is thrown four times and 
always turns up siz. What is the chance that this is the false 
die? [p. 165] 


Denoting by H; and Hy» the hypotheses that the die thrown is false and 
that it 1s true respectively, we have, a priort; 

Pr[H,] = 1/1000 ; Pr[H2] = 999/1000. 
Let & denote the event observed. Then 


Pr[E| Hy] Pr[ Hy] 


PUAIEL = pipapenyPa(A] + Pella] Pr] 


1 x 1/1000 
(1 x 1/1000) + ((1/6)4 x 999/1000) 
48 
85 . 
Whitworth, however, refers to this fraction as “the chance that the die 
should be false and [sic] szz have turned up four times” [p. 167]. 
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(Question 136. A purse contains ten coins, each of which is 
either a sovereign or a shilling: a coin is drawn and found to be 
a sovereign, what is the chance that this is the only sovereign? 
[p. 166] 


Before answering the question, let us note Whitworth’s comment on its 
phrasing. He writes 


the words “each of which” implies that the purse has been filled 
in such a way that each coin separately is equally likely to be a 
sovereign or a shilling ... The case is carefully marked off from 
that of Qn. 137. [p. 166] 


This implies that the number of sovereigns X ~ b(10, 5). Denoting by EF 
the event observed, we thus have 


Pr[B|X = 1] PrlX = 1] 


Pr[X = 1E] 7 
2 Pr[E|X = 7] Pr[X = 7] 
1/1056 V2) 

Ye (i/10)(2) 1/2) 
1 

_— 


Question 137. A purse contains ten coins, which are either 
sovereigns or shillings, and all possible numbers of each are 
equally likely: a coin is drawn and found to be a sovereign, 
what is the chance that this is the only sovereign? [p. 167] 


Bearing in mind Whitworth’s comment on the phrasing of Question 136, 
we have, in the notation of the solution of that question, 


PrX =z]=1/1l , xe {0,1,...,10}. 


‘Thus 
(1/10) x (1/11) 


Y6/10 x (1/11) 


Pr[X = 1B] 


1 


55 
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Question 139. One of a pack of fifty-two cards has been re- 
moved; from the remainder of the pack two cards are drawn 
and are found to be spades; find the chance that the missing 
card is a spade. [p. 168] 


Whitworth’s solution shows that the drawing is to be carried out without 
replacement. Let H, denote the hypothesis that the missing card is a spade, 
Hy» the hypothesis that it is not a spade, and & the event that two spades 
are drawn. Then 


Pr[E| Ay] Pr[H,| 
Pee Pr[E] Ai] Pr{ Hi] + Pr( | Ao] Pri Ao] 

12/51 x 11/50 x 1/4 
(12/51 x 11/50 x 1/4) + (13/51 x 12/50 x 3/4) 


11 
50 


Question 140. There are four dice, two of which are true and 
two are so loaded that with either the chance of throwing szz is 
5. Two of them at random are thrown and turn up sizes. Find 
the chance (a) that both are loaded; (b) that one only is loaded; 


(c) that neither is loaded. [p. 169] 


Let H,,H2,H3 denote the things whose chances are required, and let E 
denote the observed event. The initial probabilities are 


vin = ()Q)/Q = 3 
nas = (OQ = § 
naa = (JQ/Q) <3 


(Whitworth does not use this notation.) Further, 


PrlBlin} = (3)" 
Pr[E|H2] = (3) (@) 
Pr[E|H3s} = (4)° 
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Then 


1/9 x 1/6 
(1/9 x 1/6) + (1/18 x 4/6) + (1/36 x 1/6) 


Pr[Hy|E] 


Pr[ H|E] 


Pr[H3|E] 


13 . 
At this stage in the fifth edition of Choice and Chance Whitworth turns 


his attention to questions concerning the credibility of testimony. Two ex- 
amples are provided. 


Question 141. A speaks truth three times out of four, B four 
times out of five; they agree in asserting that from a bag con- 
taining nine balls, all of different colours, a white ball has been 
drawn; shew that the probability that this is true is 35. [p. 170] 


An examination of Whitworth’s solution shows that A and B are presumed 
to make their assertions independently of each other. Let Wg denote the 
drawing of a white ball, and let A,, (B,) denote the assertion by A (B) 
that a white ball has been drawn. Then 


Prl[Ay By |W] = Pr[Aw |Wal Pri By |Wal 
a See 
ee hae she 


[| 

y 
= 
5 
vy 
= 
= 


Pr[Ay A Bul Wa] 


{| 
fo FN 
CO| 

x 
qm | 
Ne oo 

x 
Pinas 
oo| = 

x 
ole 
Set 


and hence 


3/4 x 4/5 x 1/9 


Pr[Walw A Bul = ra ape x A/a) 4 (1/8 x A/a KE RTE RED) 


96 
Q7 ’ 


as asserted. 
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Question 142. A gives a true report four times out of five, 
B three times out of five, and C’' five times out of seven. If B 
and C agree in reporting that an experiment failed which A re- 
ports to have succeeded, what is the chance that the experiment 
succeeded? [p. 171] 


Let EF denote the event that the experiment succeeded, and let the sub- 
scripts f and s indicate the reporting of the experiment as a failure or a 
success. Once again it 1s to be assumed that the assertions of the witnesses 
are independent. Moreover, in the absence of any information to the con- 
trary, we shall assume with Whitworth that EF has prior probability 5: Then 


PrlE|A, A By AC] 


Pr{A, A By AC; \E] Pr| FE] 
Pr[As A By AC;|E] Pr[E] + Pr[As A By AC;| £] Pr[ EF] 


4/5 x 2/5 x 2/7 x 1/2 
(4/5 x 2/5 x 2/7 x 1/2) + (1/3 x 3/5 x 5/7 x 1/2) 


16 
a2. 


Turning his attention to inverse probability, Whitworth gives the follow- 
ing result: 


If A be a cause which may produce the event P, and a be 
the probability that when A has happened it will produce P; 
and similarly if G,7,... be the respective chances that when 
B,C,... have happened P will be produced; then the first 
“way” of P happening is made up of the compound contin- 
gency, 

(1) that A shall happen, 

(2) that A having happened shall produce P, 


and the chance of this is aw. Similarly 68, cy,... are the chances 
of P happening in the other ways... Andif... P is ad posteriori 
certain, the a posteriori chances of A, B,C,... become 

aa bZ cy 
aa+bB@+eyt+---’ aat+bB+ey4+---’ aat+bB+ey+---’ 


&c. [pp. 182-183] 


(from a previous proposition the (initial) chances of A, B,C,... are given 
as a,b,c,...). Whitworth does not provide his own definition of inverse 
probability, though he does say 
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The term Inverse Probability is used by many writers to denote 
those cases in which the @ priori probability of a cause is mod- 
ified by the observation of some effect due to the cause. 


[p. 183] 


A simple illustration concerning the drawing of a coin from a purse is 
then adduced in support of his contention that “no new principle is here 
introduced” {p. 184], and he concludes this section by saying that 


The term “Inverse Probability” appears to be unnecessary and 
misleading. [p. 184] 


In Chapter VII Whitworth turns his attention to “The rule of succession 
(so-called)” [sic], stating initially that this rule is sometimes stated as 


If the probability of an event is entirely unknown, and it has 
been observed to happen n times in succession, the chance that 
it happens the next time is (n+ 1)/(n + 2). [p. 188] 


Finding this rule imprecise because of its referring to the vague “entirely 
unknown”, Whitworth reformulates it as follows: 


RULE. 


If the probability of an experiment succeeding is so far unknown 
that all possible probabilities may be deemed equally likely: and 
if the experiment is then found to succeed n times in succession, 
the chance that it succeeds the next time is (n + 1)/(n 4+ 2). 
[p. 190] 


Although proof of this rule is given, we shall pass over it to the following 
more general result and its proof: 


GENERALISATION OF THE RULE. 


If the probability of an experiment succeeding is so far unknown 
that all possible probabilities may be deemed equally likely: 
and if the experiment is then found to succeed p times in n 
successive trials the chance that it succeeds at the next trial is 


(p + 1)/(n + 2). [p. 192] 


To prove this Whitworth supposes that the probability of a success is z/m, 
where x may take on any value in {0,1,...,m} each with the same prob- 
ability. The chance of there being exactly p successes in the first n trials is 
then, on our using the formula of total probability, given by 


Sy, ee 
m+1 ¢ p) \m m = aN? 


4=1 
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say. (Whitworth omits the von Ettingshausen symbol both here and in the 


similar expressions that follow.) If the event is observed to take place, then 
x can only take on values in {1,2,...,m— 1}, with probabilities 


Fn()(3) (5) 0) GY SY» 


the chance of a success on the (n + 1)th trial thus being 


—1 ‘ 1 ; = 
: m m 
a | 
eget ee ee . 


Ca NE Pe "| ae 
<= \m m 
Now the evaluation of a sum of the form 
= i\” fm—-i\* 
S(m;r,s) = os (=) ( = ) 
seems to call for the Euler-MacLaurin summation formula 
m. m 1 
She / f(x) de + s[f(m) — f(0) 
a=0 
Bor 7 _ 
+ 3 ap tO) (54) 


where the {Bo,} are the Bernoulli numbers (see Knopp [1990, p. 524]). 
Whitworth himself merely suggests the division of both numerator and 
denominator in (53) by m; as m increases indefinitely, the ratio will tend 


to 
_ptl 
(t#)(n +2) as CESS 1) n4+2° 


With f; = (¢/m)"(1 —i/m)* the ee term on the right-hand side of (54) 


becomes 
“ r —z\° Tr 1)T 
/ 2) (== dr = mi tDMst 1) (55) 
o \m m [P(r +54 2) 
It is then clear that if the two sums in (53) are approximated only by the 
integrals in (55), the answer given by Whitworth is obtained. Moreover, 
since p and n in (53) are integers, the infinite sum in (54) will become a 


finite sum (many of whose terms, for large enough values of p and n, will 


be zero). The evaluation of (53) as a ratio of Gamma functions is thus not 
unreasonable. 
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To investigate the ratio in (53) more carefully, notice first that 
S(m; p,q) = S(m;q,p) - 
Moreover, since 
Pt (m — i)? + i?(m —1)?tt = miP(m— i) , 
it follows that 


2S(m;p+1,p) = S(m;pt1,p)+ S(m;p,p+1) 


S(m; p, p) , 


and hence, by (58), 


R(m;p,p) = S(m;p+1,p)/S(m;p,p) 


1 
5 


(56) 


(57) 


(58) 


Also, for p = q, the approximation (p+ 1)/(p+4q+ 2) becomes 1/2, and 
so the exact and approximate solutions coincide for p = q. It also follows 


from (56) that we need consider S(m:; p,q) only for p > q. 
As in (57) one can show that, for any integral k > 1, 


3 (‘) p+k—j (m— ia = m*iP(m — i)? , 


37=0 
and hence 
* k 
S(m;p,p) = >> (|) s(mip + or ee 
7=0 
It follows that, if k = 2n, 
1 2n 
S(m;p+2n,p) = 5 soni, = 2") stmsp-+n,p+n) 


“(2 
-\S~ ( *) S(mip + 2n— j,p+3) 
j= \/ 


while for k = 2n +1, 


1 “. /2n+1 is. 
Sp + 2m +1,p) = 55(0m; 7,0) — > ( j ) Som; p +2 j,p+3). 


mel 
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Thus from S(m;p, p) all S(m;p +k, p) may be calculated recursively. 
While S(m; p, p) may of course be found using the Euler-MacLaurin sum- 
mation formula, the following remarks may be of interest. Let 
In» (i) = #(m— iP 
Then by Leibniz’s formula, 


n 


Gg) = xe ("I @)i)n-s P-I(m = jjp—nti 


j=0 


where differentiation is with respect to z. Let a,,; be defined by 


(") (Wins 


crrint(7) (2g): 


Computation of 070) at the end-points 0 and m yields 


On,j 


0 , n<p-l 
(n) (n) _ Qp—n 
Imp \™) - Imp \9) 7 Mt (On,n—p i On,p) » ne {P, che , 2p ~ 1} 
0 Qix 


Now, forn € {p,p+1,...,2p— 1}, 


On,n—p — On,p = (—1)?n! (, tal Pech 


Noting that 


S(m; p, p) 


YG/myp (1 = ifm 


“4=0 


we find from (54) that 
siman = f° (E% OZ) ee 


+ Lan a lant) — 940]. (59) 
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Denoting by A the set of integers in {(p + 1)/2,...,p}, we may write the 
last sum as 


Doe oa m?P—2nt1 an - 1)! a _ i " : [1 a (sie) 


The term in crotchets being always 2, this sum becomes 
Bon 1 ( P 
1)? sci ee . 
ee y n m@?-l\In—2n+1 
Substitution in (59) yields 
(p+ 1)T(pt+1 Bon 1 
S(msp,p) =m | PE IEEE ay OB oa) 


P(2p + 2) eo aan 2p—-2n+1 


) 


an expression that may make calculation easier. 

No sign of anything more than the use of the discrete Bayes’s Theorem 
is to be seen in Choice and Chance; perhaps all one can note is the extent 
to which the use of that result had become common by the 1860’s. 


Laurent to Pearson 


I have one concluding favour to request of 
my reader; that he will not expect to be 
equally diverted & informed by every line 
or every page of this discourse; but give 
some allowance to the author’s spleen, & 
short fits or intervals of dullness, as well 
as his own. 


Jonathan Swift, Gulliver’s Travels 
& Other Writings. 


91 Mathieu Paul Hermann Laurent 
(1841-1908) 


In 1873 Laurent! published his Traité du Calcul des Probabilités, a work 
that was to be considered as “une véritable Introduction au Traité de 
Laplace” [pp. ix—x]. The work begins with definitions and general com- 
ments, and leaving these aside, we find the following statement of a “théo- 
reme fondamental di au géometre anglais Bayes”: 


Soient p1,p2,...,DPi,... les probabilités que des causes C}, Co, 
., Cy,..., s’excluant mutuellement, donnent respectivement 
a Vévénement FE. Soient qi,q¢2,...,4:,..- les probabilités de 
ces causes. Supposons maintenant que l’événement F ait été 
observé dans une épreuve, la probabilité w; que l’arrivée de 
Vévénement observé est due a la cause C; est donnée par la 
formule w; = pig: /(pi91 + pega +--+: + pigi +--+). [p. 57] 


Laurent had earlier, by-the-by, given a precise definition of “cause”, viz. 


Nous appellerons cause d’un événement, dont l’arrivée n’est pas 
certaine, ce qui lui donne sa probabilité. [p. 47] 


The above expression for w; is also given for equal q;. 
In the section entitled “Théoréme inverse de celui de Bernoulli” Laurent 
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supposes that an event FE, of constant though unknown probability, has 
occurred @ times in s trials. Then [p. 107] “en vertu du théoréme de Bayes” 


a/s+l 


1 | 
P=Pr(lp—a/s|< = | (1 a)'ede | | e%(1—2)*"%dz. 
afs—l 0 


He obtains further the limit 


a cae 
Ps — | eva 
Vilg 
for sl/a and sl/(s — a) very small and of order 1/,/s. 


This work is also to be noted for its extensive bibliography of the prin- 
cipal works on probability published to that date. 


9.2 Cecil James Monro (1833-1882) 


In 1874 Monro, in his paper “Note on the inversion of Bernoulli’s theorem 
in probabilities”, suggested that under the name “Bernoulli’s Theorem” 
two results, the deductive and the inductive, should be comprehended. In 
the former the probability p of a given result on a single trial should be 
regarded as constant (i.e. known’), this not being so in the latter. If we 
denote by P the probability that, in the “deductive” case, the desired result 
is produced from mp — 1 to mp +1 times (or from x —1 to x +1, where x 
is the largest integer not exceeding (m+ 1)p), and by P’ the probability, 
in the “inductive” setting, that the facility of a given result that has been 
produced mp times in m trials (with a constant facility of production) lies 
between p —1/m and p+1/m, then 


l41/2 : 
P= 2Jiia | aor dN. 
0 


where h = [2p(1—p)m]~', as given by Laplace. Here it is assumed that / is 
of order \/m at most, and also that terms of order 1/m may be neglected. 
Monro points out Laplace’s two methods for the inversion of this result. 
In the first? of these P’ is set equal to P “by an implicit inference from the 
deductive theorem” [p. 74], while in the second P’ is (correctly) given by 


i 
Pp! =2 Saja | eW BA” dd 
0 


under the assumption of a uniform prior. Assuming that equal ranges con- 
tain equally probable values, Monro shows that 


the inversion is so far legitimate, that either theorem may be in- 
ferred from the other with little calculation, ... and accordingly 
that the two solutions are identical in principle. [p. 75]| 
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To this end he notes firstly that /+ 1/2 may be substituted for / in 
the statement of the deductive theorem, since our concern is with integral 
values of A. Secondly, as regards the inductive theorem, 


P is the probability that the facility lies between the limits 
p+(1+1/2)/m, and the second solution is correct for the limits 
pxI/m; provided always that a valid correspondence exists 
between the two theorems. [p. 76] 


To establish the desired correspondence, Monro denotes by uy, the proba- 
bility of n = mw results in m trials, each of facility z/m, and by Uy, the 
probability given by z results in m trials that their (constant) facility is 
within +dw of w: “This supposition expresses the hypothesis of equally 
probable values of the facility within equal ranges” [p. 76]: the required 
proviso is then established by comparing uy, in the deductive case with 
f Un dw, between (n — 1/2)/m and (n+ 1/2)/m, in the inductive. Now 


)" m-x2\""" 
m 
1 
Uy. = oF (Lae * ra) we (Law) dz 
0 
(m+ 1)! ea m—-n\"” 
xi(m— 2x)! \m m ) 
(Note the substitution of n/m for w in the numerator of U;,,.) Neglect of 


terms of order 1/m results in U, = (m+ 1)uy, and the required integration 
yields, to the desired degree of approximation, the stated result®. 


Un 


(| 
ee 
3 3 
N—_” 
oo, 
3|8 


9.3 William Stanley Jevons (1835-1882) 


Although well known for his work in economics and logic, Jevons* is less 
remembered for his statistical work. Of his writings the only one that seems 
relevant here is his book The Principles of Science: a treatise on logic and 
scientific method, published in two volumes in 1874 and in one volume in 
1877. Of this work Keynes is somewhat scathing, saying 


There are few books, so superficial in argument yet suggesting 
so much truth, as Jevons’s Principles of Science. [1921, chap. 


XXIII, §10] 


Further, while stressing the important advance made by Jevons when he 
“emphasised the close relation between induction and probability” , Keynes 
goes on to say 
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Combining insight and error, he spoilt brilliant suggestions by 
erratic and atrocious arguments. His application of inverse prob- 
ability to the inductive problem is crude and fallacious, but the 
idea which underlies it is substantially good. [loc. cit.] 


Be that as it may: let us turn forthwith to Jevons’s book itself ®. 

The tenth chapter, entitled “The theory of probability”, is devoted to 
a fairly general discussion of chance and probability, the latter being un- 
derstood as having reference to our mental condition®. Because he finds 
difficulties with “belief”, Jevons prefers to say that “the theory of proba- 
bility deals with quantity of knowledge” (1877, p. 199}. 

The method to be used in the theory has as basis the calculation of “the 
number of all the cases or events concerning which our knowledge is equal” 
[p. 201]. Rules for the calculation of probabilities are given, and the impor- 
tance of distinguishing between absolute and comparative probabilities is 
stressed. Boole’s method is found to be “fundamentally erroneous” [p. 206], 
Jevons siding with Wilbraham in this matter. 

In this chapter are to be found some remarks on antecedent (or prior) 
probabilities, including the famous example’ that the only odds that may 
be ascribed to “a Platythliptic Coefficient is positive” are evens [p. 212]. 
Jevons also comments on Terrot’s suggestion that the symbol e should be 
used, rather than 7 to express complete doubt®, and goes on to say 


if we grant that the probability may have any value between 0 
and 1, and that every separate value is equally likely, then n 
and 1 — n are equally likely, and the average is always 1/2. Or 
we may take p.dp to express the probability that our estimate 
concerning any proposition should lie between p and p + dp. 
The complete probability of the proposition is then the integral 
taken between the limits 1 and 0, or again 1/2. [pp. 212-213] 


From the first sentence it seems to follow that 2/8 and 7/8 (say) are also 
equally likely, and their average is no longer 1/2: so some care is needed 
here. Keynes criticizes Jevons’s views on this matter as follows: 


It is difficult to see how such a belief, if even its most imme- 
diate implications had been properly apprehended, could have 
remained plausible to a mind of so sound a practical judgement 
as his. [1921, chap. XX, §7] 


In the twelfth chapter, “The inductive or inverse application of the theory 
of probability”, we find a statement of Laplace’s proposition for inverse 
application of the rules of probability’, viz. 


Pr[H; | E] «x Pr[E | Hi] 


under the assumption of a priort equally probable causes H;. (Here the 
symbol “|” is interpreted as “inferred from” on the left-hand side and 
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“derived from” on the right-hand.) We also find here a discrete Bayes’s 
Rule, formulated in symbols and also in words as follows: 


If it is certain that one or other of the supposed causes exists, 
the probability that any one does exist is the probability that if 
it exists the event happens, divided by the sum of all the similar 
probabilities. [p. 243] 


The next section of this chapter is devoted to some simple applications 
of the inverse method, chiefly of an astronomical nature. Again we find 
Keynes taking exception, albeit slight, to Jevons’s use of the principle of 
the inverse method in scientific induction. 

The general inverse problem is stated as follows: 


An event having happened a certain number of times, and failed 
a certain number of times, required the probability that it will 
happen any given number of times in the future under the same 
circumstances. [p. 251] 


As an illustration Jevons considers a “balls and ballot-box” example that 
he attributes to Condorcet: an urn contains four black or white balls, in 
unknown ratio; if four drawings (with replacement) have yielded three white 
balls, what is the probability that the next draw will also yield a white ball? 
Jevons first deduces the posterior probabilities of the hypotheses specifying 
the composition of the ballot-box, and then, in the usual manner, finds the 
required probability. 

He next passes to the general solution of the inverse problem, presenting, 
though without proof, the customary expressions arising in the rule of 
succession. He then considers the extension to more than two possibilities!?: 
thus if there are n events A;,Ao,...,An, and A; has occurred r; times, 
then the probability that the next event will be A; is 


ty [Dey +1). 
1 
Furthermore, 


if new events may happen in addition to those which have been 
observed, we must assign unity for the probability of such new 
event. [p. 258] 


Thus, if there is one such new event, the probability that the next event 
will be A; is 
nr 
(73 + » [ede e ) 4 
1 


Jevons stresses the need for the incorporation of all additional information 
in the application of the method of inverse probabilitiest!. We also find 
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here a comment to the effect that, if a coin is to be tossed for the first time, 
we should assign probability 1/2 to each of the two possible outcomes. 
However, the obtaining of a head on the first throw provides “very slight 
experimental evidence in favour of a tendency to show head” [p. 260]. This 
is, of course, in accordance with the rule of succession, though it does seem 
to suggest that, after one toss, a coin will always be considered — even if 
only temporarily — as biased. 

Jevons mentions the thoughts of James Bernoulli and de Moivre on the 
estimation of the probability of future events from past experience, al- 
though Bayes and Price were “undoubtedly the first who put forward any 
distinct rules on the subject” [p. 261]. Mention is also made of the con- 
tributions of Condorcet “and several other eminent mathematicians” and 
of Laplace, who carried “the solution of the problem almost to perfection” 
[p. 261]. 

Writing of subjects in which deduction is only probable, Jevons proposes 
the following scheme: 


(1) We frame an hypothesis. 

(2) We deduce the probability of various series of possible con- 
sequences. 

(3) We compare the consequences with the particular facts, and 
observe the probability that such facts would happen under 
the hypothesis. [p. 267] 


This reasonable scheme is however followed by a statement that immedi- 
ately provides grounds for Keynes’s criticism: Jevons writes 


The above processes must be performed for every conceivable 
hypothesis, and then the absolute probability of each will be 
yielded by the principle of the inverse method. [p. 267] 


This rule Jevons describes as “that which common sense leads us to adopt 
almost instinctively” [1877, p. 243]: Keynes views it as a “fallacious prin- 
ciple” [1921, chap. XVI, §14]. 


9.4 Rudolf Hermann Lotze (1817-1881) 


In 1874 Lotze published his Logik, in which he expressed the view that 
probability is subjective; indeed he says of probability that * 


sie bezeichnet, zunachst wenigstens, durchaus nur subjectiv das 
MaB des vernunftigen Zutrauens, welches wir im voraus zu dem 
Eintreten eines bestimmten Falles dan hegen durfen, wenn uns 


*Quotations are from the third edition of 1912. 
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nur die Anzahl aller unter den jedesmal gegebenen Bedingungen 
moglichen Falle, aber kein sachlicher Grund gegeben ist, der ftir 
Nothwendigkeit des einen von ihnen mit Ausschlu8 der anderen 
entscheide. [chap. IX, art. 282.1] 


Lotze further seems to suggest that a cause C' whose likelihood is greater 
than that of any other (conditional on the occurrence of some event F) 
should be regarded as the cause of that event, when he writes 


Wenn gegebene Thatsachen aus mehreren verscheidenen Ur- 
sachen ableitbar sind, so ist dieyenige Ursache die wahrschein- 
lichste, unter deren Voraussetzung die aus ihr berechnete Wahr- 
scheinlichkeit der gegebenen Thatsachen die grofte wird. 
[chap. IX, art. 282.4] 


Though whether this is supposed to imply that Pr[C | E] is necessarily the 
greatest is not clear. 

His discussion of the rule of succession bears note: after stating that 
(m+ 1)/(m + 2) is the probability that an event EF will occur one further 
time if it has been observed m times without exception, Lotze provides the 
following proof of his result: in this fraction, viz. (m+ 1)/(m +4 2), 


der Nenner enthalt die Summe der denkbaren Falle, den nach 
mt wirklichen Fallen kommen immer 2 denkbare, Wiederholung 
und Nichtwiederholung des E, hinzu. [chap. IX, art. 282.5] 


And this deduction, he further asserts, 


mir scheint sie nicht viel weniger uberzugend, als die undurch- 
sichtigere analytische Behandlung, durch die man sie gewohnlich 
gewinnt. (chap. IX, art. 282.5] 


9.5 Charles Saunders Peirce (1839-1914) 


As befitted a philosopher of his stature, Peirce included in his voluminous 
writings some thoughts on probability and inference. An examination of 
his Collected Papers has revealed some remarks relevant to our present 
work?2, 

Peirce’s views on probability were catholic!?, and many of what have 
become tenets of the various probabilistic schools that exist today are given 
in one or other of his papers. For instance, in “The doctrine of chances” of 
1878 Peirce wrote!* 


Probability is a kind of relative number; namely, it is the ratio of 
the number of arguments of a certain genus which carry truth 
with them to the total number of arguments of that genus, 


[1878a, p. 612]; {2.657} 
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a remark that has a Laplacean smack about it. Hard on the heels of this 
we find the following: 


To find the probability that from a given class of premisses, A, 
a given class of conclusions, B, follows, it is simply necessary 
to ascertain what proportion of the times in which premisses of 
that class are true, the appropriate conclusions are also true. 
In other words, it is the number of cases of the occurrence of 
both the events A and B, divided by the total number of cases 
of the occurrence of the event A, [1878a, p. 613]; {2.658} 


a typical “finite frequency” definition. 

A little further on in this paper we find evidence of a leaning towards a 
propensity interpretation!®, for in writing of the statement that the prob- 
ability that a tossed die will show a number divisible by three is one-third, 
Peirce says 


The statement means that the die has a certain “would-be”: 
and to say that a die has a “would-be” is to say that it has a 
property, quite analogous to any had:t that a man might have. 


{2.664} 


Elsewhere Peirce connects his interpretation of probability to belief. In 
“The probability of induction” of 1878 he writes 


Probability is the ratio of the favorable cases to all the cases. 
Instead of expressing our result in terms of this ratio, we may 
make use of another — the ratio of favorable to unfavorable 
cases. This last ratio may be called the chance of an event, 


[1878b, p. 708]; {2.675} 
while later in the same paper he affirms that 


it is Incontestable that the chance of an event has an intimate 
connection with the degree of our belief in it. 


[1878b, p. 708]; {2.676} 


This “feeling of belief”, he further notes, “should be as the logarithm of 
the chance” (loc. cit.), and, moreover, “[Probability] is, therefore, a thing 
to be inferred upon evidence” [1878b, p. 709]; {2.677}. 

The “probability as limiting frequency” school will also find support in 
Peirce’s writings. For in “The varieties and validity of induction” [c.1905] 
we read 


The reasoning of the calculus of probabilities consists simply of 
demonstrations concerning “probabilities,” which, in all useful 
applications of the calculus, are real probabilities, or ratios of 
frequency in the “long run” of experiences of designated species 
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among experiences designated, or obviously designable, genera 
over those species; which real probabilities are ascertained by 
‘quantitative inductions from statistics laboriously collected and 
critically tabulated. {2.763} 


Yet again [Peirce, 1903] he writes “probability is a statistical ratio” {5.21}, 
and “it [i.e. probability] also refers to a long run” (loc. cit.). 

Even the followers of so arcane a school as “collectivism!®” will find 
what might be regarded, by not too great a stretch of the imagination, as 
a striving towards one of their main tenets. In his “Notes on ampliative 
reasoning” of 1902 Peirce writes 


If of an endless series of possible experiences a definite propor- 
tion will present a certain character (which is the sort of fact | 
called an objective probability), then it necessarily follows that, 
foreseen or not, approximately the same proportion of any finite 
portion of that series will present the same character, either as 
it is, or when it has been sufficiently extended. {2.785} 


It seems, then, that members of almost all modern schools of probability 
could find passages in Peirce’s writings that would allow them to claim him 
as a confrére. It is not, however, necessary to place him in any particular 
camp to be able to follow his thoughts on inverse probability. 

Exactly what Peirce meant by this last term is unclear, for in his “Notes 
on ampliative reasoning” [1902] we read 


Laplace and other mathematicians, though they regard a prob- 
ability as a ratio of two numbers, yet, instead of holding that 
it is the limiting ratio of occurrences of different kinds in the 
course of experience, hold that it is the ratio between num- 
bers of “cases,” or special suppositions, whose “possibilities” (a 
word not clearly distinguished, if at all, from “probabilities” ) 
are equal in the sense that we are aware of no reason for inclining 
to one rather than to another. This is an error often appearing 
in the books under the head of “inverse probabilities.” {2.785} 


Nevertheless we shall try to examine what we consider to be pertinent 
remarks. 

In his [1878b] Peirce discusses the rule of succession. Referring to “Most 
treatises on probability” [1878b, p. 712]; {2.682}, he writes 


They state, for example, that if one of the ancient denizens of 
the shores of the Mediterranean, who had never heard of tides, 
had gone to the bay of Biscay, and had there seen the tide rise, 
say m times, he could know that there was a probability equal 
to 

(m + 1)/(m + 2) 


that it would rise the next time. [1878b, p. 712]; {2.682} 
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In the next paragraph Peirce notes that this result is ridiculous when m = 0, 
i.e. when the observer had never seen the tide rise. We have already (in §4.6) 
noted Pearson’s comments on a similar situation, and these are sufficient 
to set aside Peirce’s criticism. Further remarks by Peirce on this matter 
show the inadvisability of drawing too close an analogy between problems 
involving balls-and-urns, say, and those involving natural phenomena. 

In a later paper, “A theory of probable inference” [1883a], Peirce repeats 
the problem concerning the ancient denizen — though the Bay of Biscay 
now becomes the broader shore of the Atlantic Ocean. He further correctly 
notes that the application of the doctrine of inverse probabilities requires 
knowledge of a certain prior probability, knowledge that is lacking in this 
maritime question. Similarly, since knowledge of the conclusion is missing 
before the inference in pure hypothesis or induction, 


it is impossible that the theory of inverse probabilities should 
rightly give a value for the probability of a pure inductive or 
hypothetic conclusion. [1883a, p. 172]; {2.744} 


Expanding his discussion of the choice of a prior distribution, Peirce 
notes that 


The principle which is usually assumed by those who seek to 
reduce inductive reasoning to a problem in inverse probabilities 
is, that if nothing whatever is known about the frequency of 
occurrence of an event, then any one frequency is as probable 
as any other. [1883a, p. 172]; {2.745} 


Thus, in the adduced example of four possible occasions on which an event 
may occur, one would presumably assign probabilities 


Pri X Se] 1/5.-; x €{0,1,...,4} (1) 


where X denotes the number of occurrences of that event. ‘This assign- 
ment of probabilities is found, by Peirce, to be less satisfactory than that 
which assigns equal probabilities to the sixteen possible “constitutions of 
the universe” (loc. cit.) 


YYYY, YYYN ,..., NNNN, 


where “Y” and “N” stand for “occurrence” and “non-occurrence” respec- 
tively. This last assignment implies that the number of occurrences now 
has the binomial distribution b(4, 5). 

Peirce’s reason for preferring the second of these priors to the first 1s in- 
teresting: it runs as follows. Consider, in the preceding scheme, the number 
of times (say Z) in which a Y follows a Y or an N an N. For example, in 
YYYY and NNNN we have Z = 3, while YY NY gives Z = 1. Under our 
original assumption (1) we find that Z does not have a uniform distribu- 
tion, “the probability of three occurrences being half as large again as that 
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of two, or one” [1883a, p. 174]; {2.746}. Now Peirce does not say how he 
arrives at this statement: here is a possible explanation. Suppose as before 
that 

Prix Sr S175 4 Pet Lee} 


and suppose too that the probability of any given value of X is uniformly 
spread over the possible number of Y’s and N’s making up that given value. 


Thus, for example, 
Pr Sahel 


and 
Pr[YYYN] = PrlYYNY] = PrlYNYY] = Pr[NYYY] = 1/20. 


Similarly each of the ©) = 6 arrangements of Y’s and N’s having exactly 
two Y’s has probability 35° Then 


Pr[Z = 3] = PrlYYYY or NNNN| 
= PrfYYYY]+ PriNNNN] 
= 2/5; 
Pr[Z=2] = PrlfYYYN]+Pr[INNNY]+4+ PrlYYNN] 


+ Pr[VNYY]+ Pr[NYYY] + Pr[YNNN] 
= (1/20) + (1/20) + (1/30) + (1/30) + (1/20) + (1/20) 
= 4/15; 
and similarly 
Pr[Z = 1) =4/15 and Pr[Z=0])=1/15. — 
Hence, as Peirce asserts, 
Pr{Z = 3]/ Pr[Z = 2] = (2/5)/ (4/15) = 3/2. 


On the other hand, if the sixteen possible constitutions are equally probable 
in the first case, they are also equally probable in the second. 

Peirce’s scornful opinion of the rule of succession!’ shows itself again in 
“Three types of reasoning”, the sixth lecture in his Lectures on Pragmatism 
of 1903. He first of all criticizes Laplace, saying 


Laplace was of the opinion that the affirmative experiments 
impart a definite probability to the theory; and that doctrine 
is taught in most books on probability to this day, although 
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it leads to the most ridiculous results, and is inherently self- 
contradictory. It rests on a very confused notion of what proba- 
bility is. Probability applies to the question whether a specified 
kind of event will occur when certain predetermined conditions 
are fulfilled; and it is the ratio of the number of times in the 
long run in which that specified result would follow upon the 
fulfillment of those conditions to the total number of times in 
which those conditions were fulfilled in the course of experi- 
ence. It essentially refers to a course of experience, or at least 
of real events; because mere possibilities are not capable of be- 


ing counted. {5.169} 


He also dismisses Quetelet’s discussion of the rule of succession (see §8.12) 
as “downright nonsense” (loc. cit.). 

Peirce’s other concern, if we restrict our attention merely to matters di- 
rectly pertinent to present interests, is with questions of testimony, a topic 
discussed in an unpublished manuscript, c.1901, entitled “On the logic of 
drawing history from ancient documents especially from testimonies” . Find- 
ing Hume’s thoughts on this matter, expressed in his essay “On miracles” , 
to be “excessively crude” and to be expressed “in a confused and untenable 
form” {7.165}, Peirce corrects this doctrine in the following way. Suppose 
that one has a number of independent arguments, those pro a specific event 
leading to the truth pj, po, etc. times for every q1,q2, etc. times they lead 
to error. Similarly, the arguments con that event lead to the truth qj, q5, 
etc. times for every pi, p3, etc. times they lead to error (notation altered). 
Then the probability that the arguments pro all lead to the truth and the 
arguments con all lead to error is 


Pi 
Pi+G Pi +9 


In the same way the probability that arguments pro all lead to error while 
arguments con all lead to the truth is 


Il Ue... qi 
ORG P+G 
Since one or other of these alternatives necessarily obtains the odds are | 


[ [piri /q:q} - 


Then, says Peirce, 


This is Hume’s Theory Improved, by merely being disembar- 
rassed of blunders, {7.166} 


and he names it the theory of balancing likelihoods. 
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While noting that this theory is sometimes applicable, Peirce considers 
it to be a poor way of handling ancient documents. As an example of a 
case in which it is correctly applied, he considers the following: 


Taking the time-honored urn from which balls are drawn at 
random and thrown back after each drawing, I will suppose, 
that every ball is, in fact, a box, and that out of every 7 of 
them 3 contain gold and 4 lead. I will also suppose that I have 
two expert witnesses, one of whom judges by the color, and is 
right 3 times to every time he fails, while the other judges by the 
weight, and is right 9 times for every 5 failures. Let us suppose 
the testimony is independent, the color-expert being just as 
proportionally often right when the material-expert is right as 
when he is wrong. In order to fix our ideas, let us suppose the 
numbers are as follows: 


Auriferous. Plumbiferous. 

Heavy Light Heavy Light 
Yellow, ,~A,q 15 pa 30 iP. 14 pra «66 
Grey, «Ap 21 pg 1 aP, 10 pr 66 


{7.168} 


Suppose that a ballis drawn and that both witnesses report it as auriferous. 
By the rule mentioned above, the odds that this ball is indeed gold are 


3.9 27 
Lo o 
which agrees with 
aAgt pPp 81 27 
ip Per, 15° 3° 
This solution may be more expansively set out as follows: let #; and KE» 
denote the two witnesses, and let Y,G,H,L,A, and P, stand for yellow, 
grey, heavy, light, auriferous and plumbiferous. Then 


Ey: correct diagnosis as Y and A, 

E2: correct diagnosis as H and A, 
or 81 cases. 

By: not Y (i.e. G) and not Ay (i.e. Py) } he Races 

Hy: not A (i.e. £) and not Ay (i.e. Pp) 


15 cases 


Sumilarly, 


E,: correct diagnosis as G and A, eae: 
iy: correct diagnosis as L and A, 

or 15 cases. 
Ey: not G (i.e. Y) and not Ay (i.e. Py) } eases 
Ko: not L (ie. H) and not Ay (i.e. Pp) 


The ratio is therefore 81:15 or 27:5. 
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Peirce supposes next merely that one witness testifies that the ball is 
heavy and the other that it is yellow, neither witness drawing any inference 
from his observation. From the testimony that the ball is heavy we may 
argue (partly in Peirce’s notation) that 


aAa #IYAH|Ay) 15 3 


aP> #(GNANHIP;) 10° 2’ 


while 


aAp  #(GAA|A,) 21 3 


oP, #(Y AH|P;) 14 2° 


Alternatively, we may consider 


Pr[Au|H] — Pr[H| Au] Pr[Aa] 
Pr[Ps|H] ~~ ~Pr[AH|P,] Pr[ Py] 


Therefore, as Peirce has it, 


the argument from its being heavy will be true 3 times to every 
2 times that it is false, whether the color test succeed or fail. 


{7.168} 


Similarly the argument from the testimony that the ball is yellow gives 


ple. OY Ray 18 


spe = =HIY ADB) 6 2 


while 


pAg  #(Y AL|Ay) 35 _ 5 


aPa HCY AHP) 147 2 


Once again we could have used 


Pr[Aul¥] —  Pr{Y [Au] Pr[Au] 
Pr[P|Y¥] = Pr[Y | Ps] Pr[ Po] 
50. 3 20 4 
(a7) /(e*3) 
85 
= 5. 


Peirce next notes that one would be wrong were one to infer, following 
the rule, that the odds on the ball’s being auriferous were 5 x 2 = 12 In 
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3 


the same way one would be wrong were one to incorporate the fact that 5 
e 


of the balls are auriferous, and give the answer!® as 2 x 3 x 3 = #3. Th 
true odds, he notes, are Ag: gP, = 15: 14. 
While it might be supposed that the rule did not hold in the case of 


arguments, Peirce notes that two errors are in fact involved: 


In the first place the odds in favor of a sign’s signifying a fact 
are equal to the ratio of the probability of the occurrence of 
the sign when the fact takes place to the probability of the 
occurrence of the sign when the fact does not take place; and 
in the second place the independence of two signs, considered 
as signifying the same fact, consists in the one occurring with 
the same proportionate frequency whether the other occurs or 
not, and when the fact takes place, and further, with the same 
proportionate frequency whether the other occurs or not, when 
the fact does not take place. But it is not necessary that the one 
should occur with the same proportionate frequency whether 
the other occurs or not, in general, without reference to whether 
the fact occurs or not. The required independence is not found 
in the above numbers. {7.168} 


Trying to put this passage into symbols we shall write O(F/F : S) for 
the odds in favour of the sign’s signifying F' (the fact) rather than F’. Now 
in {7.165} Peirce wrote of “the odds or ratio of favorable to unfavorable 
probability”. Hence it would seem that 


O(F/F:S) = Pr[F|S]/ Pr[F|S] 
_ PriS|F] PriF] 
Pr S|F] Prf[F] 
although the last factor here — the initzal odds — is absent in Peirce’s 


formulation. On the other hand, the verbal passage quoted above may be 
saying nothing more than 


O(F/F:S\=Prf[SAF]/PriS AF]. 


The comment on independence may be given in symbols as: S; and S»9 are 
independent if 


Pr[Si|S2 AF] = Pr[Si|F] 
Pr[Si[52AF] = Pr[Si|F] 
Pr{Si|S2AF] = Pr{Si[F] 
Pr(Si[S2A Fl = Pr[SlFl, 


in line with customary ideas of independence. 
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Although absent in the example considered before, the required indepen- 
dence is found in the following one: 


Aurtferous. Plumobiferous. 
Heavy Light Heavy Light 
Yellow, ,Aq = 21 pda = 3 4Po = Py eee 
Grey, aAp = 14 pape 2 aPp = 15 lt eae 


Then the odds in favour of a ball’s being auriferous are 
O(Au/P,) = Pr[Au]/ PrLPs] = (40/30) = 4/8 , 
and similarly 


O(A,/ Py: ¥) = Pr[Ay|¥]/ PrLP)|Y] = (24/40)/(12/30) = 3/2; 


O(Ay/P) : H) = Pr[Ay|H]/ Pr{Ps|H] = (35/40)/(25/40) = 21/20. 


Thus “the odds in favor of a heavy yellow ball being auriferous” {7.168} 
are 
O(Au/ Ps: Y AH) = (4/3) x (8/2) x (21/20) = 21/10. 
This can also be written as 
Pr[AulY A A] 


| 


Pr[A,,] Pr[Y|Au] Pr/H|Au A Y] 
PrP Pry Py) Pri Ales Wy | 


40 24 21 30 12 10 
= — X — xX — — X — X — 
70 40 24 70 3000 12 
21 
10 — 
Peirce then considers the general independence problem in the following 
symbolic way: let | 


— 


t= gAp/pAp 5 Y= pAalpAp 


€ = aPp/pPp 5 1 = pAa/pl> - 
Now independence requires the satisfaction of the conditions 
hal pA = ey and eho phy = e7 : 


Note that the first of these equalities may be written, perhaps more sug- 


gestively, as 
aAa ae pie 


aAp pAp 
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and similarly the second. Thus the antecedent odds in favour of “auriferous” 
are 


O(Au/ Ps) 


Pr[Au]|/ Pr[ Ps] 
= (aAa+ aAp + pAat pAp)/(aPat ap + pPa + pPp) 


pAp(L+ aAp/pAp + pAa/pAp + aAa/pAp) 
pol + aPo/pPp + pPalpPp + aPa/p Pp) 


pAp(1+a)(1+y) 
pPp(1 t+ E)\(1+n) ° 


on our using the independence conditions (the last line is the only one given 
by Peirce). Similarly one finds that 


O(Au/Ps:¥) = Pr{Aul¥]/ Pri l¥] 


Pr[Y|Au] Pr[Au]/ Pr[Y [Ps] PrlPs] , 
an expression that reduces, under the independence conditions, to 
O(Au/ Ps: Y) = y(1+n)/n(1 + y) . 
In the same way one has 
O(Au/Py: H) = 21 +6&)/E14+ 2). 
Finally Peirce notes that the product of the three is, under independence, 
O(Au/Pp: YAH) = PrlAulY A A]/ Pr[B|Y A Fy] 


Pr[A,] Pr[Y|Au] PrlH|Y A Au] 

Pr[ Py] Pr[Y |P,| Pr[H|Y A P,] 

Pr[Ay] Pr[Y|Au] Prl | Ay] 

Pr[ Py] PrlY | Py] Pr[H|P,] 
pApitz\(l+y) yl+n) x1+€) 
pPpit+é\l+n) nity) €(1+2) 


pAp xy 
pip En 


= gha] ale ) 


as one would expect. 
As objections to the method of balancing likelihoods in the study of 
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ancient history’’, Peirce cites (a) the lack of independence in testimonies 
and other arguments, and (b) the mistaken supposition that the narration 
of an event from time past is independent of the likelihood of the tale 
narrated, which 


almost destroys the legitimate weight of any argument from the 
antecedent improbability, unless that improbability is so great 
as to render the story absolutely incredible. {7.176} 


When it comes to the matter of the credibility of testimony, Peirce writes 
“The inappropriateness of the application of the conception of probability 
here is striking” {7.178}, his objection, or so it would seem from the ex- 
amples he instances, being that probability is only correctly applied when 
there are a vast number of observations, many small effects leading to the 
outcome of any particular event. 

A third objection to the method of balancing likelihoods, considered by 
Peirce in {7.182}, is that, once some hypothesis has been found to be prefer- 
able to others (because it is seen to be more probable?), predictions drawn 
from this hypothesis should be tested experimentally, such tests resulting 
elther in refutation or in modification of the hypothesis. But Peirce notes 
that the merits of the procedure of historical critics have frequently been 
proved to be wrong; such critics either are charlatans or are using a method 
that is wrong in principle. 

Further trenchant remarks on various definitions of, or approaches to, 
probability may be found in Peirce’s review2° of 1867 of the first edition of 
John Venn’s The Logic of Chance: we shall not pursue the matter further 
here, apart from noting the mention of the possible need for a hierarchy of 
probabilities in matters of credence and expectation. 


9.6 Bing’s paradox 


In 1879 F. Bing published a paper entitled “Om aposteriorisk Sandsyn- 
hghed” in which the concept of a posterior: probability received close 
examination?!. The paper opens with a statement of the “equally possible” 
definition of probability, which is followed by a discussion of some “balls 
and bags” type examples. Then follows a statement of the discrete Bayes’s 


Theorem, in illustration of which Bing considers the following problem??: 


A blindfolded person withdraws marbles from a bag; some of 
these marbles are found to be white; others black. Knowing that 
the marbles in the bag are either white or black, a question 
arises as to the probability of the bag’s having a particular 
content, e.g. equal numbers of black and white marbles. [p. 5] 


Bing points out that one may assume an equally probable prior distribution 
on the contents of the bag; yet while the answer then obtained is certainly 


9.6  Bing’s paradox 431 


valid, it is not the answer to the question asked. To illustrate this latter 
point, Bing assumes that the drawer of the marbles regards some of the 
drawn light marbles as white and one as yellow, and this of course affects 
the posterior probabilities obtained. 

Passing next to an application of the rule of succession?*, Bing considers 
the case in which 100 trials have yielded A, B and C respectively 49, 37 and 
14 times. Then, it is claimed, the probability that the 101st trial will yield 
none of these three letters (but “something else”) is 1/104. On the other 
hand, if we merely consider that a letter has been drawn in 100 trials, 
then the probability against drawing a letter on the next trial is 1/102. 
This position Bing finds paradoxical?*, but he notes that “the disparity 
originates exclusively from the differing application of Bayes’s Theorem” 
[p. 10]. He points out further that the two solutions are found from 


1 1 
/ v7 —u)du / | ut du = 1/102 
0 0 
and 


[eeeteta —xr—y—z)d(z,y,z) /| eye og dle az) = 1/104. 
the integrals in the latter expression being taken over 
{(z,y,z): t>0,y>0,z>0&e+y+2z< 1}. 


A further example concerns the sampling of fruit from a large batch of 
100,000 pieces, a sample of size 30 — all good — being taken. If the price 
of a good fruit is 10 @re and x denotes the ratio of good to rotten fruit, 
then the total expectation (and hence the fair price to be paid) in Kronen 


is 
1 1 

10,000 | x?! dx /| 2°° dx = 9,687. 
0 0 


If it is now discovered that each of the fruits sampled was of a different 
type, then the answer is given by considering the ratio of two thirty-fold 
integrals, in which case the value 9,836 is obtained. Bing seems to find the 
disparity unacceptable, for he writes 


most people will certainly regard it as absurd that the buyer 
should pay more for the merchandise because he has sorted the 
samples, in spite of the fact that the individual pieces before 
and after the sorting are assumed to be worth 10 @re. [p. 15] 


The same theme is pursued in the next section, where an example relating 
to mortality statistics is presented. If of /-+d persons alive at the beginning 
of a specified time period (say a year), d died during that period, and if X 
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denotes the actual probability of not dying in that year, then 
1 
Pre <X <2tde] =a! —2)'de / | e'(1—2)¢dz . (2) 
0 


However 


a contradiction immediately arises as soon as allowance is made 
for the fact that it 1s possible to apply different subdivisions of 
time. [p. 16] 


To illustrate this assertion Bing supposes that a population of individu- 
als initially aged 40 is considered, and that d,; and dz are the numbers of 
individuals who die in the first and second half-year respectively. If X and 
Y are the (true) probabilities of not dying in the first and second half-years 
respectively, then “the correct a posteriori probabilities” [p. 16] will be 


1 
g't42(1 — 2)" dz /{ g't42(1 — 2)” de 
0 
and 


1 
y(1—y)® dy j | (bea. 
0 


The probability that X and Y “samtidigt ere rigtige” (are both correct) is 
then 
eS ey ay) ded 


i gitde(1 — 2) dx ie y(1 — y)® dy 


where R= {(z,y): 2 >0,y> 0, zy < a} and where the probability of 
the survival ratio for ages 40-41 is known to lie in the interval (0, a]. On 
defining y = z/z and zc = 1 — v(1 — 2), one finds that 


Prla< A<a+da] 


1 
al (1 — a) tet] da fy%(1— v)%[1 — v(1 —a)]7* dv 
0 
= 1 1 ) (3) 
fete(l—ax)idz fy'(1—y)® dy 
0 0 


where A denotes the survival ratio. Bing notes that this formula differs 
from (2) above, the latter yielding (1+ 1)/(1 + 2) when d, = 0 = dz, while 
multiplication of (3) by @ and integration from 0 to 1 yields, for the same 
d values, [(1+1)/(1+ 2)]*. The extension of this situation to the division of 
the year into n parts, rather than two, shows that, in the limit as n — oo, 
each individual aged 40 must die within the specified time period??. 

Bing suggests that this is perhaps the first time that Bayes’s ‘Theorem 
has been applied to a subdivided year, and furthermore queries why it 
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should be more logical to suppose a uniform distribution of probability for 
the 40-41 group rather than for the 40-405. Claiming that “Bayes’s 'The- 
orem is entirely unreliable for all cases in which no a priori information 
is available as to necessary causes” [p. 18], Bing turns his attention to the 
following situation. 

Suppose that of a population of n living individuals, d; die in the first 
year and dy» in the second: suppose further that X, and Xo, the “rela- 
tionships” between the numbers dying in the first or second year and the 
numbers living at the start of those years, are bound by an unknown func- 
tion y that satisfies 


(£1, 22,01, do,n) dx, dxq = p(t2, 41, do, d1,n)dxrodzx, . (4) 


If Y; and Yo are the probabilities of surviving the first and second years, 
then 
Prin < Yi <ywt+dy) = ¥(y,di,n) dy. , 


Prly2 < Yo < ye + dya] = v(y2, d2,n — dy) dye . 
Under the assumption that Y; and Y> are independent, we have 


Pr[y <Yi<y+dyi, yo < Yo < yo + dyo] 


= P(y1, di, nr) P(y2, do,n — d1) dy, dye . (5) 
The transformation 21 = 1 ~ y1, 2 = yi(1 — yz) applied to (5) and the use 
of (4) now yield the equation 


f (1,22, d1,d9) dz, dtq = f(x, %1,do,d1) dxydz, , 
where f(x1, £2, di, dz) is defined as 


(1 —21)7* P(1 — 21, dy,n) o((1 — 21 — 2) /(1 — 21), do, n— dy) dz, . 


On taking (natural) logarithms, applying the operator 0?/@zr2 0z,, and 
setting z = (1-2, —22)/(1— 21), Bing finds on solving the resulting 
differential equation that 


w(1 — 21, dy,n) = (1— 2) F pth yg (6) 


where a and k are constants and M is chosen so that if des), Ina 
comment in a subsequent paper (which will be discussed in due course), 
Bing states that k should be set at —1, because we ought to have 


w(y1, di, n) = w(1 — y1,n—dj,n), 


though he also says that one may keep k arbitrary. 

Attention is drawn to the correspondence between this result and Bayes’s 
Theorem. Furthermore, in order that the expression in (6) define a density, 
one must clearly have a positive (if k = 1), and even this choice proves 
unreasonable if dj = 0. (Whether the derivation leading to (5) is in fact 
valid if d, = 0 is not mentioned.) Bing then concludes?*® 
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Den eneste mulige Funktionsform, som ikke giver Strid, har 
altsaa vist sig ubrugelig, og dermed mener jeg, at det er bevist, 
at der aldeles ikke existerer nogen aposteriorisk Sandsynlighed, 
naar der er Tale om Problemer, hvor man forud er absolut uv- 
idende om de virkende Aarsager. [p. 21] 


Whether all this is really necessary to conclude that posterior probabilities 
cannot exist when there are no priors is moot. 

Bing’s work did not pass unnoticed: in a paper entitled “Bemzrkninger 
til Hr. Bings Afhandling ,,Om aposteriorisk Sandsynlighed“ ”, also pub- 
lished in 1879, Lorenz raised several criticisms, the main thrust of which was 
Bing’s misunderstanding of the practical applications of Bayes’s Theorem?’. 

Lorenz comments firstly on Bing’s “bag and balls” example, and on that 
concerned with the fruit shipment, stressing that the evaluation of posterior 
probabilities is dependent on a clear statement of the initial information 
at one’s disposal. When it comes to considering the example on mortal- 
ity statistics, however, Lorenz’s criticism is sharpened. He asserts firstly 
that Bing has proved more than was concluded in the preceding quotation: 
he has in fact shown that “der aldeles ikke existerer nogen aposteriorisk 
Sandsynlighed” [p. 61] (there is no such thing as a posteriori probability7®). 
For if f(-) denotes the prior probability density function of x; (say), then 
Bing’s (1 — x1, d1,n) becomes f(z,) (1 — 2). ™ x, and hence if w has 
no usable form, Bayes’s Theorem must be false. 

Consideration of the two cases d; = 0,dy = n and d,; = n, dy = 0 in turn 
persuades Lorenz that Bing’s basic assumption that 


(£1, £2, d1,do,n) dx, dra = (x2, 21, do, d),n) dro dr, 


is wrong, and further investigation leads him to conclude that Bing’s para- 
dox illustrates the unreasonableness of the assumption that X, and X92 are 
independent. 

Unconvinced by Lorenz’s comments, Bing eagerly seized the opportu- 
nity offered him by the editors of the Tidsskrift for Mathematik to reply; 
and this reply in turn was followed by a rejoinder from Lorenz, in which 
Bing’s expression (5) for the posterior probability was used to show the 
equivalence of 


it 
grt — oa dx /| ae = co aes dx 
0 


(an analogue of (2)) and 
atl] =e cy) di td2)—1 Aa. He peal] 7 yr! dv 
its ge(ltde)-1(1 — x)edi-1 dg i, yt-1(1 — y)ed2-1 dy 


(the probability that the proportion between 40 and 41 years lies between 
a and a+da). 
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Satisfied with his exposition, Lorenz stated at the conclusion of this 
paper that he regarded the matter as closed, a view that was not shared 
by Bing, who forcefully reiterated his argument. We shall not pursue the 
controversy further here?’. 


9.7 A question of antisepticism 


In December 1881 the following problem was posed by Donald MacAlister®® 
(1854-1934) in the columns of The Educational Times®!: 


Of 10 cases treated by Lister’s method, 7 did well and 3 suf- 
fered from blood-poisoning; of 14 cases treated with ordinary 
dressings, 9 did well and 5 had blood-poisoning; what are the 
odds that the success of Lister’s method was due to chance? 


[Problem 6929] 


This seemingly simple question occasioned much controversy in subsequent 
issues of the journal, and we shall take a brief look at some of the opinions 
(and hackles!) raised. 

The first solution proposed, in the issue for lst February 1882, was by 
Alexander MacFarlane (1851-1913), and since it gave rise to much com- 
ment, we shall present it in full. 


Let p denote the chance of a case treated by Lister’s method 
doing well, and q the chance of a case treated with ordinary 
dressings doing well, then p = 7/10 and g = 9/14. But Lister’s 
method consists in the ordinary dressings with the additional 
use of an antiseptic; hence the effect of the antiseptic is p — q, 
that is 2/15 [sic]. Hence the odds that in a given case the success 
of Lister’s method is not due to the characteristic part of it is 


q/(p — q), that is 45/4. [p. 77] 


Commenting on this solution, the proposer describes it as “very inade- 
quate.” This is followed by the sentiment. 


It is a good rule in Probabilities to refrain from introducing any 
datum of your own into the conditions of the question [loc. cit.], 


a suggestion as to which there might well be some debate. But be that as 
it may: after some further criticism of MacFarlane’s solution, MacAlister 
proceeds to his own. To this end he rephrases the question in terms of balls 
and urns: suppose that two urns A and B each contain a large number 
of white and black balls. From A, a + 6 balls are drawn in succession (a 
being white), while p + q are similarly drawn from B (p being white). If 
a/(a+b) is found to be greater than p/(p+q), what are the odds that the 
proportion of white balls in B is actually less than that in A? (One might 
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well ask whether MacAlister is obeying his own Good Rule in thus styling 
the problem.) If P denotes the probability that there are fewer white balls 
in B than in A, then 


nm—-ls—l \ 
s*(n — s)’ tP(n — t)? 
P me sz) t= ( ) 
1—P  ne-in-1 
s4(n — s)> tP(n —t)9 
sl t=s 


oe wa fo ne . 
a p+1 a+p+l 
where S denotes the (terminating) series 


a q a a-l q q-l 
sa oe b+2 Oe b+2 b+3 p+2 p+3 re 
with a similar expression for 1 — P. 

Using this result with a = 5,b = 9, p= 3, q =7 MacAlister finds that 
P = 0.59825, whence P/(1—P) = 1.49. (We shall comment on this solution 
later.) 

In the issue of the first of March 1882 of The Educational Times prac- 
tically an entire page is taken up with this problem, the discussion being 
opened by the Editor who describes MacFarlane’s solution as “a brief but 
somewhat obscure process” [p. 103]. MacFarlane’s defence is the first to 
be given: he re-affirms his earlier solution, but this time adduces more 
reasoning. The Scottish theme is further embellished by Hugh MacColl 
(1837-1909) who charges MacAlister with a violation of his own principle, 
inasmuch as 


He seems to me to have ‘introduced into the conditions of the 
question’ the datum that, independently of the experiments, it 
is an even chance (or 1/2) whether Lister’s method has any 
advantage over the ordinary methods. [p. 103] 


Elizabeth Blackwood makes some general comments: these in turn are 
followed by a lengthy reply from MacAlister, in which he defends himself 
against MacColl by saying (perhaps rather weakly) that his assumption “is 
surely not merely a just assertion, but the only one possible” [loc. cit.]. 
His comment on the immediately preceding remarks is rather unkind, viz. 
“Miss Elizabeth Blackwood has perhaps not read my solution” [loc. cit.]. 

A new contestant enters the lists in the issue of the first of April. William 
Whitworth (1840-1905) proposes to denote by p the probability of success 
under the old treatment, and by pp the probability of success under Lister’s 
treatment. The a priori probability of success is then 


P = p*(1 —p)*(up)"(1 — up)” 
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where 0 < pp < 1/p. The probability that Lister’s method is of no advantage 


is then 
1 1/p 
Priu<ij = [a- was | | "(1 — pp)? dy 


= 165p® — 440p? + 396p'? — 120p"? . 


If all we know is given in the statement of the question, we must set p = 
9/14, and we then find the chance as 0.0078. The odds in favour of Lister’s 
method being advantageous are then about 229 to 2. Whitworth notes 
further that P is maximal for p = 7/10p, which is thus the most likely 
value of 4, and he summarizes his results as follows: 


(a) The chance (from the observed cases only) that Lister’s 
method should precisely make no difference, is less than any 
assignable chance. The question as stated can only mean this, 
and the only true answer is zero. 

(3) The odds that Lister’s method is beneficial, as against the 
position that it is either useless or injurious, are about 114: 1. 
(y) The odds in favour of the statement that Lister’s treatment 
will succeed in 7 cases out of 10, as against the position that 
his treatment makes no difference, are about 21 : 20. 

[p. 127] 


MacColl next re-enters the field, pointing out essentially that, in his 
view, the only correct interpretation of probability is in terms of long-run 
frequencies, and suggesting that it might be wiser to denote the unknown 
initial probability by x (say) and obtain the answer in terms of this quan- 
tity, rather than to make the (probably false) assumption that the a prior 
chance is 1/2. 

Blackwood has the last word with some general remarks, and in a final 
sally on the first of May she takes exception to Whitworth’s solution on 
account of his assuming that uw has a uniform distribution: “For this as- 
sumption I can discern no warrant whatever in the data of the question” 
[p. 153}. 

In his solution to the problem MacAlister acknowledged the work of 
Carl Liebermeister (1833-1901), who had “published several tracts in which 
problems similar to mine are very clearly discussed” [p. 78]. The pertinent 
problem, given in Liebermeister [1877], is reported by Winsor [1948] as 
follows: 


A sample from population 1 has given a failures and 6 successes; 
a sample from population 2 has given p failures and g successes. 


We have 
Dp a 


 eeeraaenenieened 


<-—_——-: 
p+q a+b 
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Required, the probability that the true proportion of failures in 
population 2 is less than that in population 1. [p. 166] 


Letting a@ and @ be the probabilities of failure in the first and second pop- 
ulations respectively, we find that 


a 


Pr [observed result | a, 3] = (“ 7 ; (+) a*(1—a)’B?(1— B)!. 


If one supposes that a and @ are independently uniformly distributed over 
[0,1], then by Bayes’s Theorem 


| foc.sasia | ff a10.0aeae 


SS) “~ perteeonn, seen 
ea a b a+b+1 
where Q(a, 8) = a%(1—a)’BP(1 — B)¢. 

Winsor [1948] compares the expression for 1—P derived from this formula 
with that given by Fisher’s analysis of the 2 x 2 table, and concludes that 
“Liebermeister’s probability is the same that Fisher would calculate from 
a table with the frequencies on one diagonal increased by unity” [p. 167]. 
He concludes too that Liebermeister’s method yields, for small samples, 
smaller values of 1 — P (“and hence apparently stronger indications of sig- 
nificance” [p. 168]) than Fisher’s. 

The relationship between Liebermeister’s test and Fisher’s has been ex- 
plored in detail in Seneta [1994]: the following brief discussion owes much 
to this paper®?. In the case of the binomial trials as described by Winsor 
and given above, Fisher [1970, §21.02] showed that the hypothesis test of 


P 


l| 


Hjp:a=f vs Hi:a>8 
is based on the p-value 
P=. | 
a k at+p—k a+p 


The two tests may then be compared by examining the relative sizes of p 
and 1 — P, with rejection of Hp if the value is small. The connexion with 
Fisher’s statistic is perhaps more transparent if Liebermeister’s is written 
(as he gave it) in the form 


lee: 5 atb+1 pret )/ we 
ee k a+tl+p—k a+l+p 


9.8 Francis Ysidro Edgeworth 439 


9.8 Francis Ysidro Edgeworth (1845-1926) 


As Stigler [1986a, p. 305] has noted, Edgeworth*® stands out as a curiosity 
among nineteenth-century statisticians. His formal training in classical lit- 
erature and matters jurisprudential, a training evinced by a literary style 
that is erudite and entertaining, subtle and succinct, was followed by a deep 
personal study of mathematics, the fruit of which is abundantly evident in 
his various writings on ethics, economics and statistics®*. Of those in the 
last group, the earliest falling within the scope of our study is a paper, 
published in 1883, on the method of least squares. 

The following passage (I. A(3) in Edgeworth’s paper) will serve as an 
example of the use made here of inverse probability’: 


Given a set of observations 21,29, &c., and given that they 
have been generated by divergence according to one and the 
same probability-curve from a single point, but given neither 
that point nor the modulus, to find both. [p. 366] 


The actual finding of the mean and the variance need not concern us here 
(the method is that of maximum likelihood). Germane however is the fol- 
lowing discussion. Let us define 


PE FL Bese Kg ey) 


i oe n 
(. =| exp(-3 (ei - 9"). 
P f(€, ¢) d& de | 
ie: P f(€,c) dé de 
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Then 
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Edgeworth now supposes that the prior distribution is of the form ke~? 
(where & is constant), an expression obtained by assuming that € and ¢ are 
independent and that A = 1/c has a uniform distribution. With this prior, 
and on integrating the joint density, Edgeworth finds that 


Cag gemma as >.< E +n(#—€) /x a -7) 


This, as Welch [1958, p. 779] has noted, reduces to®® 
f(t |o1,..-,an)e +e /(n—- 1p Vr? 
1/2 
on one’s putting t = ,/n(n — 1) (@ - €) } ID (x; — z)’| 


In 1884 Edgeworth published in the Philosophical Magazine a paper 
entitled “A prior: probabilities.” After having pointed out that 
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In the measurement of a physical quantity it is generally as- 
sumed that, prior to observation, one value of the quaesitum is 
as likely as another [p. 204], 


he illustrates this point by considering a problem he had raised in an earlier 
paper [Edgeworth 1883], viz. consider a set of observations {z1,2%2,...} 
“diverging according to a given probability-curve” from a point x. This 
point z is found by solving 


a3 ; 2] _ 
7pPh ez exp |-h >> (x — 2) | a0) 


“where p is the @ priort probability that the real value of the quaesitum is 
between x and « + Az” [p. 204]. He points out the modification required 
if p, rather than being constant, is equal to Az y(x). Mention is also made 
of more complicated problems, and the general remark is made that 


In so far as these methods are applications of Inverse Probabil- 
ities they involve a priori assumptions [p. 206], 


this remark encompassing the rule of succession. 

Edgeworth points out further that when calculations a posteriori of the 
probability that some phenomenon is not due to chance are made, some 
assumption as to the a priorz probability of the existence of chance is 
needed, and concludes that the theory advocated by Boole and Donkin is, 
in respect of such a priori probabilities, more correct than the practice of 
Laplace and Herschel. Supposing, then, that a prior: probabilities are in 
fact needed, Edgeworth finds this need to be so far satisfied as 


to allow of a mathematical, though not a numerical, inference 
in cases where the a posterzori probability has a limiting value, 
provided that an involved a@ priort probability is not extreme. 


[p. 207] 


The correction introduced in the earlier example may safely be ignored 
when n is indefinitely large provided that y(x)/x(z) is finite. The general 
argument appears to be that the effect of the prior diminishes with increas- 
ing experience. 

It is also pointed out that if X has a uniform distribution over (0,1), 
such will not be the case for X*, and that when the form of a function is 
completely unknown one may assume that one which makes for most ease 
of calculation. 

Edgeworth devotes another paper?’ of 1884, published in Mind and enti- 
tled “The philosophy of chance”, chiefly to criticism of Venn’s The Logic of 
Chance®®. He agrees with Venn on the essential similarity between inverse 
(quad inverse) and direct probability, and suggests further that 


the much decried method of Bayes may be employed to deduce 
from the frequently experienced occurrence of a phenomenon 
the large probability of its recurrence [p. 228], 
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a remark that might perhaps be interpreted as support for the rule of suc- 
cession. 

The assignment of equal probability-constants in the case in which noth- 
ing is known is founded upon the “rough but solid experience” [p. 230] that 
such constants do, in practice, tend to have one value as often as another. 
Moreover, 


The ridicule which has been heaped upon Bayes’s theorem and 
the inverse method will be found only applicable to the pre- 
tence, here deprecated, of eliciting knowledge out of ignorance, 
something out of nothing. The most formidable objection is that 
which was made by Boole, and is repeated by Mr. Venn, Mr. 
Peirce, and others with approbation. Our procedure in treat- 
ing one value as @ priori not less likely than another is, 1t 1s 
said, of a quite arbitrary character, and apt to lead to different 
conclusions from the plausible one which we have reached by 
accident. [p. 230] 


A parody of Boole’s argument is given, the conclusion being that an appeal 
to experience is of prime importance: an appeal Pearson [1920a] notes “from 
which Bayes and Laplace ought to have started” [p. 4], though it might in 
fact be argued that both Bayes and Laplace did in fact base their arguments 
upon some sort of prior experiment. 

Following a discussion of some examples, Edgeworth states 


The preceding examples ... may show that the assumptions 
connected with ‘Inverse Probability,’ far from being arbitrary, 
constitute a very good working hypothesis. They suggest that 
the particular species of inverse probability called the ‘Rule of 
Succession’ may not be so inane as Mr. Venn would have us 
believe. [p. 234] 


In 1885 Edgeworth published a paper in which the application of prob- 
ability to psychical research was considered®’. The sort of problem under 
examination is the following: one person chooses a letter of the alphabet, 
say, and a second guesses the choice, the experiment being repeated N 
times. Under the supposition of mere chance, the most probable number 
of successes is m = Nu (here u = 1/24)*°. Similar series of trials are then 
carried out, and the following three problems present themselves: 


What probability in favour of the existence of some agency other 
than chance is afforded by (1) a single series such as the first, 
in which the successes are in excess fof m]; (2) a set of series 
such as the first two or three, in all of which the successes are 
in excess; (3) a chequered set of series in some of which the 
successes are in excess, in others in defect? [p. 190] 


Thes questions, Edgeworth stated, could be reduced to the following: 
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Out of an urn known to contain an infinite number of white 
and black balls in the proportion u:1-— wu have been drawn N 
balls whereof N(u + v) are white; and again N’ balls whereof 
N‘(u+v’) are white; and so on. v is sometimes negative. What is 
the probability in favour of agency other than chance deducible 
(1) from the first series; (2) from a set of series in which v is 
positive; (3) from a chequered set of series? [p. 190] 


The evaluation of posterior probabilities of this kind involves three op- 
erations “which may be distinguished in analysis, though implicated in 
practice” [p. 191], these operations being the following 


The first (1.) is to determine what function the required prob- 
ability is of two sets of variables; namely, @ priori probabilities 
not given by (or deducible from) direct statistical experience, 
and “objective” probabilities (to use the phrase of Cournot), 
which are derived from statistical experience. ‘The second op- 
eration (II.) is the treatment of the @ priori probabilities; the 
discovery, assumption, or ignoration of those unknown quanti- 
ties. The third operation (III.) is the evaluation of the objective 
probabilities. [p. 191] 


Two schemata are given for the solution of the first problem: the first uses 
Bayes’s Theorem, while the second “savour[s] more of Bernoulli than of 
Bayes ” [p. 192]. 

In the first (i.e. the Bayes) method, the drawing of the N(u + v) white 
balls is viewed as the result of some real constitution of balls in the urn. 
Letting y(x) denote the a priori probability that the desired ratio is the 
particular ratio r/N, and f(z) the objective probability that precisely m+n 
white balls would be drawn in N trials were « : N — x the real distribution 
of the balls, Edgeworth gives the probability that the observed event results 
from some possibility greater than u as 


N 
2, (2) x f(x) 
(2) sft) 


This method Edgeworth finds to be that used by Laplace in his discussion 
in the Theorie analytique des probabilités on the difference in the ratio of 
male to female births, and he observes (1) that Laplace’s example is con- 
cerned with a finite number of observations, and (2) the “characteristic 
neglect” by Laplace of prior probabilities. 

The second method, claimed to be more appropriate to the matter con- 
sidered here, runs as follows: 


Let a be the 4 priori probability that chance alone should have 
been the régime under which the observed event occurred. Let 
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p be the objective probability that, chance being the régime, 
a deviation from u in the direction of success at least as great 
as vu should occur. Let @ be the @ priori probability that there 
should have been some additional agency. Let y be the (not 
in general objective) probability that, such additional agency 
existing, the observed event should occur. Then the required a 
postertort probability in favour of the additional agency is 


By 


——— ; wherea=1-(@. Ws 
By + ap 4) 


[p. 192] 


Now this is rather confusing, and I can only assume that evidence of what 
Mirowski [1994, p. 5] describes as Edgeworth’s careless proof-reading is to 
be seen here. | would suggest that we make the following identifications: 


a =  Priadditional agency] 

@ = Prichance| 

y =  Prieventladditional agency] 
p = Prlevent|chance] . 


This interpretation is substantiated by a later formula 


By 
By! + ap! ’ 
where 


p’ is the (very small) probability that the particular deviation v 
should occur under the régime of chance; y’, is the probability 
(presumably of the same order of magnitude) that, an additional 
agency existing, the exact deviation v should have occurred; a 
and ( are as before. [p. 193] 


As a user of the second method Laplace is again cited, this time in con- 
nexion with an example concerning the cause of differences in barometrical 
pressures. 

Both methods, Edgeworth suggests, may be seen as defective in certain 
respects: the first, in that our exact knowledge of u is not used, and the 
second in that while v is given, consideration is taken only of the fact that 
the deviation belongs to the class from v to u— 1. 

Still under the first operation (i.e. the determination of the choice of 
the appropriate function of the a priori probabilities and the “objective” 
probabilities), consideration is now given to the second problem, the case 
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of a set of series. Edgeworth suggests that the expression (7) should again 
be used, with p replaced by pp’p”... and with the meaning of ¥ similarly 
altered. The third problem is to be handled by grouping the given series in 
such a way that a set is formed, in all of which the successes are in excess. 

The reader is referred to Edgeworth [1884a] for a discussion of the meth- 
ods appropriate to the second operation, that is, the treatment of the a 
priort probabilities. It is suggested that in the Bayesian method, the func- 
tion p(x) be ignored, particularly when NV is large, Edgeworth contending 
that there are even empirical reasons for considering it as constant. As 
regards expression (7), here a and @ should both be put equal to 1/2, con- 
sonant with experience, while “To put that same value for y, appears, while 
not contradicted by, yet less agreeable to, experience” [p. 195]. If none of 
a, 2 and y is very small and if p is very small, then (7) reduces by Taylor’s 
formula to approximately (1 — ap/@y), and hence p may be taken as a 
rough measure of the desired posterior probability. 

In the second problem, considered under the heading of the second op- 
eration, the effective measure (“the real grip of proof”) of the posterior 
probability is seen to be pp’..., while the third problem is found to have 
been resolved into the other two. 

Attention now passes to the consideration of the third problem, the eval- 
uation of the objective probabilities. Edgeworth uses examples culled from 
other writers in a fairly natural way, and concludes with the following ob- 
servation: 


Such is the evidence which the calculus of probabilities affords 
as to the existence of an agency other than mere chance. The 
calculus is silent as to the nature of that agency — whether it 
is more likely to be vulgar illusion or extraordinary law. That 
is a question to be decided, not by formule and figures, but by 
general philosophy and common sense. [p. 199] 


A useful summary of the state of the art was provided by Edgeworth 
in his article “Probability” in the eleventh edition of the Encyclopedia 
Britannica in 1911. Here the marriage of what Bowley [1928, p. 6] terms 
Edgeworth’s “metaphysical conception of probability” and his statistical 
investigations is clearly visible: indeed, Edgeworth is a realization of Dick- 
ens’s Nicholas Tulrumble, who “contracted a relish for statistics, and got 
philosophical”. 

Arguing against Laplace’s definition of probability, kdgeworth urges here 
that “merely psychological facts can at best afford a measure of belief, not 
of credibility” [§2, p. 377], but he nevertheless finds the frequency view 
“not so diametrically opposed as may at first appear” [|3, p. 377]. Again 
the question of the invariance of the prior distribution is raised, and it is 
suggested that, when values are constrained to lie in a small interval, any 
(reasonable) function “of a quantity which assumes equivalent values with 
equal probability” [98, p. 377] will have approximately the same probability 


9.8 Francis Ysidro Edgeworth 445 


distribution as any similar function. Indeed, he continues, 


It may further be replied that in general the reasoning does 
not require the a priori probabilities of the different values to 
be very nearly equal; it suffices that they should not be very 
unequal; and this much seems to be given by experience. 

[18, p. 377] 


Passing, in Section II (“Calculation of Probability”), to the probabil- 
ity of causes deduced from observed events, Edgeworth points out firstly 
that the principal difference between problems to which these methods are 
applicable and others 


consists in the need of evidence, other than that which is af- 
forded by the observed event, as to the probability of the alter- 
native causes existing and operating. [944, p. 382] 


Three examples follow, the first being concerned with digits drawn at ran- 
dom from mathematical tables, the second being taken from Laplace’s 
Théorie analytique des probabilités (Book II, chap. 1, N° 1), and the third 
coming from Bertrand’s Calcul des Probabilités (art.134). 

Paragraph 48 sees the start of Edgeworth’s discussion of the probability 
of testimony, two basic assumptions in which are the following: 


(1) that to each witness there pertains a coefficient of proba- 
bility representing the average frequency with which he speaks 
the truth or untruth, (2) that the statements of witnesses are 
independent in the sense proper to probabilities. [§48, p. 383] 


(These assumptions Edgeworth finds open to serious criticism.) It is shown 
that for r witnesses of credibilities (or average truthfulness) p,,p2,... , Dr, 
the probability that a statement is true is*! 


[Tp / Te +110 — pi) 


r 
Division of both numerator and denominator by [] p; shows that this prob- 


1 
ability increases with r, provided that each p; > 1/2. 

The rule of succession comes under scrutiny in Paragraph 94, illustration 
being provided by the drawing of one further white ball from a mixture of 
an immense number of white and black balls, when it is known that n 
draws have all yielded white. Under the assumption of a uniform prior, 
Edgeworth obtains in the limit 


[ota | [ ras nin49). 
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In Part II of his article, “Averages and Laws of Error”, Edgeworth turns 
in the first section to the law of error. He makes the perhaps somewhat 
unusual observation that 


there is a characteristic more essential to the statistician than 
the existence of an objective quaesitum, namely, the use of that 
method which is primarily, but not exclusively, proper to that 
sort of quaesitum — inverse probability. (F123, p. 395] 


Inverse probability has a two-fold use here: (a) to determine the best val- 
ues of the coefficients appearing in the law of error, and (b) to test the 
worth of the results obtained by using any values of these coefficients. As 
an example*? of the procedure Edgeworth considers the case of n observa- 
tions £1,22,...,2n from a Normal distribution with given modulus c. The 
probability P that the observations should have resulted from measurement 
of an object whose real position was between x and x + Az is then 


PS Ax J exp{— (2 = a1)? +--+ (@ = t)"] et . 


where J is a constant of proportionality. The most probable value of zx is 
found, by maximization of P, to be % (the arithmetic mean** of the n 
observations), a statistic with modulus c/,/n. It is further pointed out that 
the same reasoning is applicable to the case in which data and quaesitum 
are proportions rather than absolute quantities, 


for instance, given the percentage of white balls in several large 
batches drawn at random from an immense urn containing black 
and white balls, to find the percentage of white balls in the 
urn — the inverse problem associated with the name of Bayes. 


[{130, p. 397] 


Laplace, Edgeworth notes, did not adopt this approach. He saw the quae- 
situm not as the most probable value, but rather as “that point which may 
most advantageously be put for the real one” [§9131, p. 397]. This neces- 
sitated calculation of “la valeur moyenne de [erreur a craindre”, that is, 
“the mean first power of the errors taken positively on each side of the real 
point.” Gauss, on the other hand, took as the appropriate criterion the 
mean square of errors; and Edgeworth notes further that 


Any mean power indeed, the integral of any function which 
increases in absolute magnitude with the increase of its variable, 
taken as the measure of the detriment, will lead to the same 
conclusion, if the normal law prevails. [{131, p. 397] 


Attention is also drawn to the modifications necessary if (i) different 
values of x are not equally probable prior to observation, (ii) the x; come 
from distributions with different moduli, (iii) the modulus is also unknown, 
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and (iv) the observations come from a bivariate Normal distribution with 
unknown correlation coefficient. 

Some forty years after the paper in which he had considered Venn’s book, 
Edgeworth returned to the topic in a review, also entitled “The philosophy 
of chance”, of Keynes’s A Treatise on Probability, a review published in 
Mind in 1922. In Keynes’s dialectic Edgeworth finds support for his earlier 
contention that Venn had gone too far in his scepticism as regards a priort 
probabilities based on the principle of sufficient reason (or indifference). 

Following on the recollection of some examples from Venn and some 
economic applications, Edgeworth notes that 


It may be observed that in general, for instance in all the ap- 
plications which have just been noticed, the use of @ priori 
probabilities has no connexion with inverse probability. That 
conjunction does occur in one very important branch of Proba- 
bilities — that which deals with errors-of-observation. [p. 262] 


This assertion is illustrated by an example involving several observations of 
the measure of some magnitude whose ascertainment is required, a special 
case being the estimation of a ratio (e.g. of black balls drawn from an urn) 
rather than an absolute magnitude. One requires that combination of the 
observations that yields the best value of the quaesitum. 

Further reference is made to Keynes’s views on a priori probabilities and 
the rule of succession, and on the latter Edgeworth remarks 


when the relevant a prior: probabilities... are overruled by the 
number of the observations, as may be shown by the reasoning 
above cited, the Rule of Succession is by no means so absurd. 
[p. 265] 


Moreover 


ad priori probability is generally negligible in comparison with 
the evidence of repeated observations [p. 266], 


which strengthens a remark made earlier. 


9.9 Charles Lutwidge Dodgson (1832-1898) 


Perhaps better known for the fanciful books and poems written under the 
nom de guerre “Lewis Carroll” than for serious mathematical work, the 
Reverend Charles Lutwidge Dodgson** in fact wrote a number of books in 
his professional field. While opinions of his fantastical works are uniformly 
high, views on his more sober writings are less unanimous. Eric Temple 
Bell, for example, considered Dodgson’s range of knowledge as no better 
than that of a modern first-year student at a technical school (see Lennon 
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[1945]), while Warren Weaver wrote of what was perhaps Dodgson’s most 
important work on geometry, 


Kuchd and His Modern Rivals must be classed as amusing, 
ridiculously opinionated and scientifically unimportant, 
[1956, p. 118] 


and further 


In all of Dodgson’s mathematical work it is evident that he was 
not an important mathematician. 
[1956, p. 120] 


However Seneta has recently done much (see his [1984] and [1993]) to draw 
attention to the important contributions made by Dodgson to linear alge- 
bra and the theory of determinants*®. 

Dodgson’s work on probability is limited to a number of questions in his 
pseudonymously published Pillow-Problems*®. Of the seventy-two prob- 
lems of which this slim volume is comprised, thirteen are concerned with 
probability, and of these, twelve (numbers 5, 10, 16, 19, 23, 27, 38, 41, 45, 50, 
58 and 66) appear in the list of subjects under the heading “ALGEBRA:— 
Chances”, while Number 72 is given under “TRANSCENDENTAL?” PROB- 
ABILITIES”. We shall consider here*® only those questions that deal with 
inverse probability*?. 


5. 

A bag contains one counter, known to be either white or 
black. A white counter is put in, the bag shaken, and a counter 
drawn out, which proves to be white. What is now the chance 
of drawing a white counter? [8/9/87 


[p. 2] 
Dodgson’s solution runs in full as follows: 


At first sight, it would appear that, as the state of the bag, 
after the operation, is necessarily identical with its state before 
it, the chance is just what it then was, viz. 7 This, however, is 
an error. 

The chances, before the addition, that the bag contains (a) | 
white (b) 1 black, are (a) 5 (b) 5. Hence the chances, after the 
addition, that it contains (a) 2 white (6) 1 white, 1 black, are 
the same, viz. (a) 5 (6) 4. Now the probabilities, which these 2 
states give to the observed event, of drawing a white counter, 
are (a) certainty (6) 4. Hence the chances, after drawing the 
white counter, that the bag, before drawing, contained (a) 2 
white, (b) 1 white, 1 black, are proportional to (a) 5 -1 (6) 5 t 
i.e. (a) $ (6) 4; ie. (a) 2 (6) 1. Hence the chances are (a) 4 
(b) 5. Hence, after the removal of a white counter, the chances, 
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that the bag now contains (a) 1 white (b) 1 black, are for (a) 4 
and for (b) =. 


3 
Thus the chance, of now drawing a white counter, is 2 
Q.E.F. 
[pp. 31-32] 


This is quite correct: however, the solution becomes perhaps more trans- 
parent if we approach it as follows. After the placing of the white counter 
in the bag, we can describe the contents (in an obvious notation) by 


fy : W(W) or Hy» : B(W) 


where “(W)” indicates that a white counter has been added. Although not 
stated in the problem, the assumption of a uniform prior seems reasonable, 
and we accordingly set 


Pr Ay 172 Pell 1 2% 
Denoting by O the drawing of a white counter, we have 
Pr{O|H,| il ; Pr[O|H2| = 1/2 d 


Now let Wz denote the final drawing of a white counter. Using the theorem 
of total probability in the form 


Pr[B|F] = So Pr[B|F & Gj) Pr{Gj|FI, 


(as Dodgson often does), we have, finally, on recalling the assumption of a 
uniform prior”? , 


lI 
_ 


Pr{Wa|O] Pr[Wa|O & H;,] Pr[H;|O] 


il 
fend 


I 
ea 


Pr[WalO & H;] Pr[O|H,] PrlH i/z Pr[O|A;] Prl A] 


1 


es. 


[x 1) + (0% 4] / +4) =2/3. 


16. 

There are two bags, one containing a counter, known to be 
either white or black; the other containing 1 white and 2 black. 
A white is put into the first, the bag shaken, and a counter 
drawn out, which proves to be white. Which course will now 
give the best chance of drawing a white — to draw from one of 
the two bags without knowing which it is, or to empty one bag 


into the other and then draw? | [10/87 
ip. 4] 
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Consider firstly the first bag. Then, exactly as in Problem 5, we have 
Pr[Wa|O] = 2/3. 


Similarly, on considering the second bag of composition W BB, we see that 
the probability of obtaining a white counter is 3. Thus, assuming that the 
bags are equally likely to be chosen, we find that 


Pr[WalO] = (1/2) - (2/3) + (1/2) - (1/3) = 1/2. 
If, on the other hand, the bags are combined, the resulting composition is 


H3: WBBW, or 


H,: WBBB, 
and then 
Pr[Wa] = Pr{Wa|Hs] Pr[H3] + Pr[Wa|Ha] Pr[H4] 
= (2/4) - (2/3) + (1/4) - (1/8) 
= 5/12. 


The first option is thus preferable. 


19. 

There are 3 bags; one containing a white counter and a black 
one, another two white and a black, and the third 3 white and 
a black. It is not known in what order the bags are placed. A 
white counter is drawn from one of them, and a black from 
another. What is the chance of drawing a white counter from 
the remaining bag? 

[p. 5] 


Denoting the bags by A, B and C respectively, Dodgson assigns each of 
the 3! arrangements probability z- Let O denote the observed event, let W, 
denote the final drawing of a white counter, and let H,, Ho,..., H6 denote 
the arrangements ABC’, ACB, BAC, BCA, CAB, CBA respectively. Then 


PriO\fi) = (1/2)-™ (1/3) = 1/6. 3 PriO|fs} = 1/8 
Pr[O|H3] = 1/3 ; Pr{O|H.4] = 1/6 
Pr[O|Hs] = 3/8 ; Pr[O|He] = 1/4. 


Using the theorem of total probability as in Problem 5, we therefore have 
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Pr{Wa,|O] d- Pr[WalO & H;] Pr[H;|O] 


5 


ry 


Pr[O|H;] PrLi] 


lI 


S* Pr[W,|O & Hi] 


a 


Pr[O| Hj) Pr[A5] 


Me 


j= 1 


>, Pr[WalO & Hy] Pr{[O|H;| /s Pr[O|H;] ; 


on recalling the assumption of a uniform prior. Substitution of the above 
figures then leads to the evaluation of Pr[Wg|O] as 


(3 xg) + G xa) + Gx 3) + (2x) + Xa) + GX 9) 


23. 

A bag contains 2 counters, each of which is known to be 
black or white. 2 white and a black are put in, and 2 white and 
a black drawn out. Then a white is put in, and a white drawn 
out. What is the chance that it now contains 2 white? [25/9/87 


[pp. 5-6] 


The first thing to note, in considering Dodgson’s solution, is the assump- 
tion that the initial states 


WW; WB; WW 


are taken to have a binomial prior distribution. This is in marked dis- 
tinction to the assumption made in the previous problems when the prior 
distribution was taken to be uniform — though, of course, with reference 
to Problem 5 one notes that if Y is a random variable taking on the values 
0 and 1 each with probability 5 then 


Y ~ U({0,1})@ Y ~ b(1, §) . 
After the insertion of the three counters, the states become 
H,:WW(WWB); Ho: WB(WWB); 43: BB(WWB) , 
the binomial probabilities giving 


Pr[Hy] =1/4 ; Pr[Ho]=1/2 ; Pr[Hs]=1/4. 
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Let O, and Oz denote the two drawings. Then 


reini=(() /0)=3 
rina ()()/()=1 
nouns ()()/Q)=% 


On applying the discrete Bayes’s Theorem we obtain the posterior distri- 
bution 


Pr[A,|O] = 2/7 ; Pr[H.|O]=4/7 ; Pr[Hs|O] = 1/7. 


The (random) withdrawal of one black and two white counters and the 
(deterministic) replacement of a white counter result in the compositions 


Ha: WWW ; Hs :WWB ; He: WEBB. 


Using the posterior distribution just obtained as the prior for the second 


drawing we have 
rosta ()/Q)= 
roses ()(2)/C) 
rosta (0) /C) =} 


By a further application of Bayes’s Theorem we get 
Pr[H4|O2] = 6/15 ; Pr{Hs|O2] = 8/15 ; Pr[He6|O2] = 1/15, 


and the chance that the bag now contains two white counters is thus 6/15. 


27. 

There are 3 bags, each containing 6 counters; one contains 5 
white and one black; another, 4 white and 2 black; the third, 3 
white and 3 black. From two of the bags (it is not known which) 
2 counters are drawn, and prove to be black and white. What is 
the chance of drawing a white counter from the remaining bag” 

[4/3/80 
[pp. 6-7] 
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Let the bags be denoted by A,B and C respectively, the subscript r at- 
tached to any one of these letters indicating that that bag remains after 
the (first) drawing of one white and one black counter, an observed event 
that we shall denote by O;. Then 


Pr{O,|A,] =  Pr[white from B & black from C 


+ Pr{black from B & white from C] 


(4/6) - (3/6) + (2/6) -(3/6) = 18/36, 
and similarly, 


Pr[O1|B,] = 18/36 ; Pr[Oy|C,] = 14/36 . 


Fach of these three probabilities is in fact multiplied by 5 in Dodgson’s 
solution (some sort of averaging?): this factor in fact cancels out in the 
subsequent steps. 

Using Dodgson’s assumed uniform prior (4, t 5) over the “remaining” 


bags, we obtain, by a discrete Bayes’s Theorem, 


Pr[O1|A,] Pr[A;] 


Prl4-lOi] =  BOTA,)Px[4,] + Pr[OlBrl Pr[B,] 4 Pr[OlG] Pr[Cr] 


— 9/25 


Pr{B,|O,] = 9/25 ; Pr[C,|O1] = 7/25. 


On letting O2 denote the drawing of a white counter from the remaining 
bag we have 


Pr[O2|0,| 


Pr[O2|A, & O;| Pr{A,|O;] +- Pr[O2|B,. & O;| Pr{B,|O;] 


+ Pr[O2|C;, & O;] Pr[C,|O,] 


(5/6) - (9/25) + (4/9) - (9/25) + (8/6) - (7/25) 


[| 


17/25 . 


38. 

There are 3 bags, ‘A’, ‘B’, and ‘C’. ‘A’ contains 3 red coun- 
ters, ‘B’ 2 red and one white, ‘C” one red and 2 white. Two 
bags are taken at random, and a counter drawn from each: both 
prove to be red. The counters are replaced, and the experiment 
is repeated with the same two bags: one proves to be red. What 
is the chance of the other being red? [3/76 


[p. 9] 
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Dodgson first of all considers the (ordered) arrangements ABC, ACB, 
BAC, BCA, CAB, CBA, the first letter representing the bag from which 
the unknown counter is taken, and the second letter representing the bag 
from which two red are drawn. If we denote by O the observed event (i.e. 
the drawing of a red counter), then 


Pr[O|ABC] = 1- (2/3)? = 4/9 ; Pr[O|ACB] = 1- (1/3)? = 1/9 
Pr[O|BAC] = (2/3) - (1)? = 2/3 ; Pr[O|BCA] = (2/3) - (1/3)? = 2/27 
Pr[O|CAB] = (1/3) -(1)2 = 1/3 ; Pr[O|CBA] = (1/3) - (2/3)? = 4/27. 


Then, using a discrete form of Bayes’s Theorem (and assuming that the six 
possible arrangements are equally probable), we have 


Pr[ABC|O] = 12/48 ; Pr[AC B|O] = 3/48 
Pr[BAC|O] = 18/48 ; Pri BCA|O] = 2/48 
Pr[CAB|O] = 9/48 ; Pr(CBA|O] = 4/48 . 
On denoting by R* the event that the unknown counter is red, we thus have 


Pr[R*|O] Pr[R*|O & ABC] x Pr[ABC|O] 


+-..+4+Pr[R*|O & CBA] x Pr[CBA|O] 


{| 
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49/72 . 


Now there is clearly something wrong here, for if we suppose that the 
unknown counter is white (an event that we shall denote by W*) rather 
than red, then an argument similar to that given above yields 


Pr{(W*|O] = 31/54 4 1 — Pr[R*|O] . 


A careful examination of Pr[O|BC A], for example, will show how the error 
arises. For 


Pr[O|BC'A] = Pr[R drawn from 2nd bag |BC A], 


while 
(2/3) - (1/3)? = Pr[R from B] x {Pr[R from C]}* , 


and these are not equivalent. 
A correct proof runs as follows: consider ordered arrangements of the 
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bags, as before, and let O, be the first observed event — 1.e. the drawing 


of a red counter from each of two bags. Then 


Pr[O,|ABC] = 1-(2/3) ; Pr[O,|ACB] = 1- (1/3) 
Pr[O,|BAC] = (2/3)-1 ; Pr[O,|BCA] = (2/3) - (1/3) 


Pr[O,|CAB] = (1/3)-1 ; Pr[O.,|C BA] = (1/3) - (2/3) . 
Again assuming that these ordered arrangements are equally probable, we 
find that 
Pr[ABC|O,| = 6/22 ; Pr[ACB|O,| _ 3/22 


Pr[BAC|O,] = 6/22 ; Pr[BCA|O,] = 2/22 


Pr(CAB|O,] = 3/22 ; Pr{(CBA|O,] = 2/22. 


The counters originally drawn are now replaced, and a second counter is 
drawn from each of the two bags sampled before. Let O2 denote the event 
that, a red counter having been drawn from the second bag, a red counter 
will also be drawn from the first. Then 


Pr[O2|ABC & Oj] Pr[RR|ABC & O,] = 1- (2/3) 


II 


Pr(O.|ACB & Oi] = 1-(1/3) 
Pr[O.|BAC & Oi] = (2/3)-1 
Pr(O2|BCA & Oi] = (2/3)- (1/3) 
Pr[O2|CAB & Oi] = (1/3)-1 


Pr[O.|CBA & Oi] = (1/3) - (2/3). 


Thus 
Pr[O2|0;] = Pr[O2|ABC & O,] x Pr[ABC|O.] +: -- 
= (3% a9) + (4x a) + (3 x a) 
+ (3 x ge) + (5 * ga) + (5 * aa) 
= 49/99, 
that is 


Pr[RR|O,] = 49/99 . 


456 9 Laurent to Pearson 


Similarly one finds that 
Pr[RW|O,] = 23/99 = Pri/W R|O,| ; 


and hence 


Pr[two red|at least one red & Oj] 


Pr{RR|O;| 
Pr/ RRO] + Pr|(W R/O] + Prl RW|O,] 


(I 


49/95 . 


(For an alternative proof using unordered arrangements of bags see Seneta 


[1984, p. 87].) 


41. 

My friend brings me a bag containing four counters, each of 
which is either black or white. He bids me draw two, both of 
which prove to be white. He then says “I meant to tell you, 
before you began, that there was at least one white counter in 
the bag. However, you know it now, without my telling you. 
Draw again.” 


(1) What is now my chance of drawing white? 
(2) What would it have been, if he had not spoken? _ [9/87 


[pp. 9-10] 


To answer the second question, Dodgson supposes that the bag has one 
of the following five compositions: 


WWWW ; WWWB; WWBB; WBBB ; BBBB. 


He assumes too that the number of white counters X ~ b(4, 5). If O denotes 
the observed event (i.e. the drawing of two white counters), and E denotes 
the event expected, then 


Pr[O|JWWWW]=1,; PrlO|WWWB] = 1/2 
Pr[O|WW BB] = 1/6; Pr[O|W BBB] = 0 


Pr/O|BBBB] =0. 
Thus, by the discrete Bayes’s Theorem, 
PrIWWWWY|O]=1/4 ; PrIWWWBI|O] = 1/2 ; Pr[WWBB|O] = 1/4 


Pr[WBBB|O] =0 ; Pr[BBBB|O] =0. 
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Hence, under the tacit assumption that the counters comprising O are not 
replaced after having been drawn, we have 


Prf[E|O] = PrlE|WWWW & O} x Pr(WWWW|O]+--- 
+Pr[E|BBBB & O] x Pr[BBBB|O] 
(1 x (1/4))((1/2) x (1/2)) 


=. 2 


{| 


The validity of Dodgson’s solution to the first question, however, is moot. 
He begins by saying 


As there was certainly at least one W in the bag at first, the 
‘a priori’ chances for the various states of the bag, ‘WWWW, 
WWWB,WWBB,WBBB, were ‘e, 3 7 5. 

[p. 62] 


These last four fractions suggest the use of a binomial distribution 6(n, p) 
with n = 3 and p= $ and it seems then that Dodgson is supposing that 
each possible bag has one white counter, and that the number of remaining 
white counters Y ~ b(3, z). With this interpretation Dodgson shows (or, 
more accurately, states) first that 


PrlO|WWWW]=1 ; PrlO|WWWB\= 1/2 
PrlO|WW BB] =1/6 ; PrlO|WBBB|=0. 
The application of Bayes’s Theorem then yields 
Pri WWWW]|O] = 1/3; Pr[WWWB|O] = 1/2; Pr[WWBB|O] = 1/6, 
and thus 
Pr[E|O] = Prl[E|WWWW & O| x Pri|WWWWwioje-:-- 
+ Pr[E|WWBB & 0] x Pr[WWBB|O] 


= (1 x (1/3)) + ((1/2) x (1/2)) 
= 7/12. 


It seems debatable, however, whether the assumption made by Dodgson 
as to the distribution of the possible constitution of the bag is correct. For 
if one supposes that one counter is known to be white, can one really say 
of the counters “each of which is either black or white” and use b(3, +)? 
Following the interpretation given by Seneta [1984], we shall consider 


458 9 Laurent to Pearson 


Pr[at least one white counter] = 1— Pr{no white counter] 


(0) (5) = 


Pr[WWWW] 
Pr[ at least one W] 


- (4) /@) 


lI 


Then, given the friendly advice, we have 


Pr[WWWW |at least one W] 


= 1/15, 
Pr[WWW Bat least one W] = 4/15, 
Pr[WW BBlat least one W] = 6/15, 
Pri[WBBBlat least one W] = 4/15, 
Pr{[BBBB\at least one W] = 10s, 


Proceeding as in our earlier discussion, and again assuming the non- 
replacement of the two white counters first drawn, we have 


PrlWWwWww)|o} as 1/4 

T posed SSS See ; 
(1x gg) + (3x te) + (GB) 

PrWWWBI|O] = 1/2, 

Pr[WWBB|O] = 1/4. 


These values coinciding with those obtained before, we find once again that 
Pr[E|O] = 5. 


. 50. 

There are 2 bags, H and K, each containing 2 counters: and 
it is known that each counter is either black or white. A white 
counter is added to bag H, the bag is shaken up, and one counter 
transferred (without looking at it) to bag K, where the process 
is repeated, a counter being transferred to bag H. What is now 
the chance of drawing a white counter from bag H? 


[p. 11] 


This problem, viewed by Seneta as “perhaps the most complex problem 
of the set” [1984, p. 88], is undated: more than a page is devoted to its 
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solution by Dodgson. Again it is initially supposed that the number of 
white counters in bag H has the binomial distribution b(2, 5). Thus, on 
the addition of a white counter to H, the possible compositions 


WWW ; WBW ; WBB 


have probabilities , 4 


of total probability) 


and 7 respectively, and hence (again by the theorem 


Pr{white drawn from H] 


l 
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8/12, 


the chance of a black counter’s being drawn therefore being a. 
This drawn counter (of unknown colour) having been placed in bag Kx, 
the possible states of the latter container are 


WWW ; WWB ; WBB ; BBB, 


with 


i] 


PriWWw] Pr[WW originally in K and W transferred] 
(1/4) x (2/3) = 1/6 , 


Pr{[WW originally in K and B transferred] 


PriWWB] 


+ Pr{WB originally in K and W transferred] 


= (1/4) x (1/8) + (1/2) x (2/8) 
237 40) 12), 
Pr[(WBB] = Pr[WB originally in K and B transferred] 


+ Pr[BB originally in K and W transferred] 
= (1/2) x (1/8) + (1/4) x (2/8) 


= 173 5 
Pr{BB originally in K and B transferred] 


(1/4) x (1/8) 


Lo: 


Pr[BBB] 


II 
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Thus 
Pr[white drawn from K] = (1 x =) le (4 x 5) ie (4 x 1) 
= 5/9: 
the chance of drawing a black counter thus being $. 


This transference of a counter from H to K leaves the former in one of 
the states WW, WB or BB, the probabilities being given by 


Pr[WW] = Pr[WWW and W transferred] 
+ Pr[WWB and B transferred] 
= 2C/4)% ty 2) 173) 9/12, 
Pr{(WB] = Pr{[WWB and W transferred] 
+ Pr[WBB and B transferred] 
= (1/2) x (2/3) + (1/4) x (2/3) = 1/2, 
Pr[W BB and W transferred] 


= (1/4) x (1/3) = 1/12. 


The next stage in the process consists in the transferring of a counter from 
IK back to H. The chances of the possible compositions of H are then 


Pr[ BB] 


Pr|WWW] = Pr[WW in H] x Pr[W transferred] 


= (5/12) x (5/9) = 25/108, 


Pr{[WWB] = Pr[WW in H] x Pr[B transferred] 
+ Pr[W B in H] x Pr[W transferred] 
= (5/12) x (4/9) + (1/2) x (5/9) = 50/108 , 
Pri{WBB] = Pr{WBin H] x Pr[B transferred] 


+ Pr[BB in H] x Pr[W transferred] 


= ((1/2) x (4/9) + ((1/12) x (5/9)) = 29/108, 
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Pr[BBB| Pr[BB in H| x Pr[B transferred] 


(1/12) x (4/9) = 4/108. 
Thus, finally, 


Pr[W drawn from H] = (2% x 1)+ (28 x $)+(4$ x §) 


17/27 


In his introduction Dodgson wrote 


every one of them [i.e. the “Pillow-Problems”] was worked out, 
to the very end, before drawing any diagram or writing down a 
single word of the solution. 


[p. x] 


The solution discussed here shows evidence of the considerable mental ca- 
pacity and patience he must have possessed. 


66. 

Given that there are 2 counters in a bag, as to which all that 
was originally known was that each was either white or black. 
Also given that the experiment has been tried, a certain number 
of times, of drawing a counter, looking at it, and replacing it; 
that it has been white every time; and that, as a result, the 
chance of drawing white, next time, is a/(a@ + @). Also given 
that the same experiment is repeated m times more, and that 
it still continues to be white every time. What would then be 
the chance of drawing white? [9/89 


[p. 15] 


Were he to proceed as before, Dodgson should suppose that the number 
of white counters in the bag has the binomial distribution b(2, dy. However, 
he starts his solution by setting 


Pr[WW]=2, PriWB]=1-2, 
and then he notes that 

Pr[W drawn] = x +(1— 2) x (%) . 
On setting this sum equal to a/(a + @), he gets 


2a/(a+B)—1 


(a — B)/(a + f). 


x 
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_ This posterior distribution for the first part of the problem is then used 
as the prior in an application of Bayes’s Theorem to obtain the posterior 
distribution required. 

Dodgson considers what the posterior would be after one and two further 
repetitions of the experiment, and then uses mathematical induction to 
obtain the result for m repetitions. As Seneta [1984, p. 89] has noted, 
however, it seems easier (and again one might wonder whether Dodgson 
did in fact carry out his complicated solution mentally) to use the theorem 
of total probability and Bayes’s Theorem. For we have 


Pr[m white drawn] = Pr[m white drawn|WW] x Pr[WW] 
+ Pr[m white drawn|W B] x Pr[W 8B] 


a-8 fe Oe 
a+tfp 27? a+’ 


and hence 


Pr[1 further white drawn|m white drawn] 


= Pr[(m-+ 1) white drawn]/ Pr[m white drawn] 


a— 6 1 a-@ a— l a—f 
( a+B” Int ahi ( ath" om ny) 
2™(a— 8) +8 
2™ (a — 3) + 28 
which is Dodgson’s solution. 
Strictly speaking, the following — and the final — problem does not 


require the use of Bayes’s Theorem: however, there is some kind of inverse 
reasoning involved, and we accordingly adduce it here. 


72. 
A bag contains 2 counters, as to which nothing is known 
except that each is either black or white. Ascertain their colours 
without taking them out of the bag. [8/9/87 


[p. 18] 


This is Dodgson’s “Problem in ‘Transcendental Probabilities”: in view of 
the curious nature of his solution, we present it here in full. 


We know that, if a bag contained 3 counters, 2 being black 


and one white, the chance of drawing a black one would be 5; 


and that any other state of things would not give this chance. 
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Now the chances, that the given bag contains (a) BB, (£) 
BW, (y) WW, are respectively as 7 i. 

Add a black counter. 

Then the chances, that it contains (a) BBB, (8) BWB, (7) 
WWE, are, as before, 3,5, 5. 

Hence the chance, of now drawing a black one, 


= (1/4) -1+ (1/2) - (2/3) + (1/4) (1/3) = 2/8 . 


Hence the bag now contains BBW (since any other state of 
things would not given this chance). 

Hence, before the black counter was added, it contained BW, 
i.e. one black counter and one white. Q.E.F. 


[p. 109] 


Crudely put, Dodgson’s solution amounts to concluding that if the prob- 
abilities Pr[B|H;] and Pr[B] are equal, then H; necessarily obtains. In his 
paper commemorating the centenary of Dodgson’s birth, Eperson notes in 
connexion with this problem 


that if one applies a similar argument to the case of a bag con- 
taining 3 unknown counters, black or white, one reaches the still 
more paradoxical conclusion that there cannot be 3 counters in 


the bag! [1933, p. 99] 


Seneta [1984, pp. 89-90] shows in fact that in this latter case the prior 
(binomial) distribution on the number of black counters is {é: 3. 3, x}, 
while, after the addition of a black counter, the probability of drawing a 
black is 2 — which does not coincide with any of the prior values. (A com- 
pletely analogous argument holds in the case of the addition of a white 
counter.) We shall present a more general argument here®?. 

Consider a bag containing n counters, each of which may be either white 
or black. Then the number of black counters X ~ b(n, 5). After the addi- 
tion of one further black counter, the possible states of the bag are 


B...B(B) , B...BW(B),..., W...W(B) 
n n—-1 n 


with respective (prior) probabilities 


(3) + (@) @) GG) 


The respective probabilities of drawing a black counter are 


n+1 n n—(n—1) 
Sek ay 
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Thus, by the theorem of total probability, 


1 fn\n+1 1 fn\ on 1 fn\ n-—(n—1) 
Prf/B] = — eee is iia Le ey ee eae aE ale) 
1B] = op @hesar- () + Ton (*) n+ 


5 io ae GAs) (9) 
Now 
& (f) @ri-n ~ m+n ()-dr(") 


{| 
an 
= 
ap 
— 
ee 
ee) 
3 
| 
M: 
8 
, 
3°2~C«€383 
ic Suge 
ae 
tes 


and hence 


Substitution in (10) thus yields 


(7) nti-n 


r=0 


(n+1)2"—n2"-! 


l| 


lI 


(n+ 2) 97-1, 
and hence, from (9), 
Pr[B] = (n + 2)/[2(n + 1)] . 


If this is to coincide with one of the initial probabilities (cf. Dodgson’s 
argument), then there is some k € IN such that 


n+2 —(n+1)—k 
2in+1) n+l 


? 


whence k = n/2. So Dodgson’s “proof” will not work if n is odd. Note also 


that if n = 2, we have k = 1 and (n+1—k)/(n+1) = 2, as in Dodgson’s 


example. 
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9.10 Morgan William Crofton (1826-1915) 


In the ninth edition of the Encyclopedia Britannica in 1885 Crofton de- 
votes twenty-one pages to the subject of probability®?, a concept that he 
understands in the following sense: 


The probability, or amount of conviction accorded to any fact 
or statement, is thus essentially subjective, and varies with the 
degree of knowledge of the mind to which the fact is presented 
(it is often indeed also influenced by passion and prejudice, 
which act powerfully in warping the judgement), — so that, as 
Laplace observes, it is affected partly by our ignorance partly 
by our knowledge. [p. 768] 


The determination of such probability, however, is to be accomplished via 
frequencies, viz. 


In fact we may say, considering how seldom we know a priori 
the probability of any event, that the knowledge we have of such 
probability in any case is entirely derived from this principle, 
viz., that the proportion which holds in a large number of trials 
will be found to hold in the total number, even when this may 
be infinite, — the deviation or error being less and less as the 
trials are multiplied. [p. 769] 


The second section, after a general introduction, of this article is entitled 
“Probability of future events deduced from experience”. Here Crofton illus- 
trates, in several simple examples, the following general principle: suppose 
that P, denotes the antecedent probability of the ith cause°? C;, that p; 
denotes the probability of an event & given C;, and that a large number 
N of trials have been made. Then 


out of these the number in which the first cause exists is P; N, 
and out of this number the cases in which the event follows are 
pm PN. [p. 773] 


Continuing in this (frequentist) way Crofton finds the a posterior: prob- 
ability 7; of C; to be ; 
i= Del's jie ae a ar ; (11) 
It then follows that the probability of a further occurrence of the event is 
DPT. 

A further illustration is concerned with sampling from an urn contain- 
ing n white or black balls. If r drawings have resulted in white balls, the 
probability that the (r+ 1)th draw will also yield a white ball is given by 


1 2 n 
oe ee uae So rk 
k=1 


Neal 
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if sampling is carried out with replacement, and 
(r + 1)/(r + 2) 


(independent of n) in the case of sampling without replacement. 
Crofton next focuses attention on 


the important theorem of Bayes ... the object. of which is to 
deduce from the experience of a given number of trials, as to an 
event which must happen or fail on each trial, the information 
thus afforded as to the real facility of the event in any one trial, 
which facility is identical with the proportion of successes out 
of an infinite number of trials, were it possible to make them. 
[p. 774] 


(Note the emphasis on a limiting frequency approach.) This problem** 
Crofton phrases in terms of a “balls and urn” example, and he derives 


p= i oe yrae | fam (1 — 2x)" dz (12) 


as “the probability that the ratio of the white balls in the urn to the whole 
number lies between any two given limits a, @” [p. 774]. 

The usual extension to p+q further draws following m+n draws results 
in the probability 


1 1 
ey? gMtP(] —z)rt4 ac | f a (Le) de 
P 0 0 
eitee ee 
p q pt+q | 


Lidstone [1920, §18] extends this result by an analogous argument to the 


case of an urn containing balls of 2 different colours. However he in fact 
questions 


whether Crofton’s process ... can be considered strictly demon- 
strative; but it is at least simple, elegant, suggestive, and easily 
carried in the mind. [p. 191] 


Crofton deduces further that, when (m+n) is large, (12) reduces to 


2 a ot 
Prilp — m/(m+n)| < 6] ~ e* dt, 
[|p — m/(m +n) < d]~ ef 
where A = 6(m + n)3/? /\/2mn. 
In §8IV Crofton turns his attention to the probability of testimony. -He 
begins by giving a classification, reminiscent of that given by Laplace, of 
the ways in which a witness may fail: 
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he may be intentionally dishonest, or he may be mistaken; his 
evidence may be false, either because he wishes to deceive, or 
because he is deceived himself. [p. 777] 


The first of a number of problems considered in this section is the follow- 
ing: suppose that a witness, of credibility p, states that a fact (or an event) 
occurred or did not occur. “If nothing was known a priori as to the proba- 
bility of the fact, or if its real facility was $” [p. 777], then the probability 
that the occurrence in fact took place is p. This answer Crofton justifies by 
an argument involving a large number WN of trials. 

In the second question it is supposed that the witness says that he has 
seen the white ball drawn from a bag containing n balls, exactly one of 
which is white, the rest being black. Letting p denote the credibility of the 
witness and supposing that each ball is equally likely to be drawn, Crofton 
deduces that the probability w that the white ball was in fact drawn is 
given by 

2 n~'p 

“Rpt (Lhe ao Ven) ° 
He notes that this probability is very small when n is large, unless p is 
nearly 1, and observes that 


We thus have a scientific explanation of the universal tendency — 
rather to reject the evidence of a witness than to accept the 
truth of a fact attested by him, when it is in itself of an ex- 
traordinary or very improbable nature. [p. 777] 


In 437 Crofton considers the following problem: 


Two independent witnesses, A and B, both state a fact, or that 
an event turned out in a particular way (only two ways being 
possible), to find the probability of the truth of the statement. 
[p. 777] 


On our denoting by p and p’ the credibilities of the witnesses, the desired 
probability is seen to be 
pp’ 
a 13 
pp! + (1 — p)(1— p’) - 
Letting A; and B; respectively denote the events that A and B tell the 
truth, and letting S; denote the truth of the statement, we have 
Pr[Az|S¢] —?p,; Pr[{By|S:] = p! } Pr[S¢|Az A Bi] =W, 
so that 


Pr[{ A; A B, [Se] Pr[S;] 


rLSelAy A Bi] Pr[Ay A Bz|S¢] Pr[S;] + Pr[Ar A Be|Sz] Pr[ Sz] 
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Under the assumptions that A and B report independently of each other 
and that the prior probability that the statement is true is 7 this last 
formula becomes 

Pr[A;|S;] Pr{B, Sz | 
Pr[Az|S;] Pr{ Be | S¢] + Pr{ A;| 5%] Pr[B,|S;] 


i 


Pr[S; | A, /\ Bi 


f 


PP 
pra pe) 

as in (13). Crofton states the independence of the testimonies of A and B 
in his problem: the nature of the prior probability is given in his discus- 
sion of the question in the statement that, while nothing is known a priori 
about the event in question, a large number JN of trials have yielded N/2 
successes. 

A sequence of examples shows in essence that, if n witnesses of credi- 
bilities p1,po,..., Pn agree in saying that a certain event has occurred, the 
probability that the event did in fact occur is 


I>: / [Tm +11 - 9} , 


and Crofton deduces that thirteen witnesses, each of credibility p = 9/10, 
are enough to make the chance more than an even one that a fact, the odds 
against whose occurrence are a billion®” to one, did actually occur. 

A number of similar examples, in which there may be more than two 
possible outcomes, or the possible outcomes may not be equally probable, 
or the testimony may be discordant, with A making one avowal and B 
another, follow: we shall not discuss these here. 

More interesting, however, is the following question, which we shall quote 
in its entirety: 


suppose it has been found that a certain symptom (A) indicates 
the presence of a certain disease in three cases out of four, there 
is a probability $ that any patient exhibiting the symptom has 
the disease. This, however, must be considered in conjunction 
with the a priori probability of the presence of the disease, if 
we wish to know the value of the evidence deduced from the 
symptom being observed. For instance, if we knew that < of 
the whole population had the disease, the evidence would have 
no value, and the credibility of the symptom per se would be 7 
telling us nothing either way. For if a be the a prior: probability, 
w that after the evidence, p the credibility of the evidence, we 


have found ae 
= 14 
op+ (a9) = 


so that, ifw—=a,p= 


hoje 
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If w and a are given, the credibility p of the evidence is 
deduced from this equation, viz., 
(1 —a)w 
ee se 


[p. 778] 


Now there seems to be something strange about this: Crofton appears to 
be using (13), whereas (11) would seem more natural. For if we denote by 
D the presence of the disease, then 


Pr[D|A] = w (= 3/4) ; Pr[D] =a (= 3/4), 


and hence Pr{D|A] = PAID] PriD) 
~ Pr[A|D] Pr[D] + Pr[A|D] Pr[D] 
or — —_PrfAIDla (15) 


Pr[A|D]a + Pr[A|D](1 — a) 
To make this agree with Crofton’s expression we must set Pr[A|D] = p 
(“the credibility of the evidence”)°*. But then Pr[A|D] 4 1 — p! Indeed, a 
“translation” of (14) is 


_ Pr[A|D] Pr[D] 
ed la Pr[A|D] Pr[D] + Pr[A|D] Pr[D} ° 
which clearly differs from (15). 


Having expressed p in terms of w and a from (14) as 


(l—a)w 
atw—2aw 


) 


Crofton next considers the case in which two independent symptoms A and 
B occur. With 


Pr[D|A] =w ; Pr[D|B] =u’; Pr[D] =a, 


Crofton concludes that “the value of the evidence of B” [p. 778] is, as 


before, 

». (la)! 
,= a+w! —2aw! ’ 
and on combining this with w he finds the probability a “of the disease 
where both symptoms occur” to be given by 


/ 


wp 
wp! + (1 —w)(1 — p’) 
(1 — a) ww! 
(1~—a)wo’+a(1—w)(1-w’) | 


A numerical example follows. 


WV — 


lI 
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This method is now linked to Price’s rule of succession (see (12)) as 
follows: if a coin tossed m times yields “heads” on every throw, then the 
probability that the real facility for “heads” is greater than ; is 


1 1 
v= | maz / f eo" dese 1/2" e"., 
4 0 


there being a strong initial assumption that the facility is 7 Suppose now 
that there is a very small a priori probability p 


that either in the coin itself or the way it is thrown there is 
something more favourable to head than to tail. [p. 779] 


After the observed sequence of m heads in m tosses this probability will 
become 
pw (anti — 1)p 


per pie) Om rbes2 pel” 
These results are then used in considering “the verdicts of juries, the 
decisions of courts, and the results of elections” [p. 779]: we shall not pursue 
the matter here. 


9.11 Johannes von Kries (1853-1928) 


In 1886 von Kries published his thought-provoking book Die Principien 
der Wahrscheinlichkeits-Rechnung. Eine logische Untersuchung, in which 
detailed discussion of the subjective theory may be found®’. We shall re- 
strict our attention here, inasmuch as it is possible, to directly relevant 
matters. 

Von Kries provides a precise definition of equiprobable events, viz. ! 


als gleich moglich zwei oder mehrere Falle anzusehen sind, wenn 
in dem jeweiligen Stande unserer Kenntnisse sich kein Grund 
findet, unter ihnen einen fur wahrscheinlicher als irgend einen 
anderen zu halten. [p. 6] 


Here we clearly see the subjective basis on which von Kries’s work rests°®, 
though that he himself viewed it in a logical sense is evinced by the following 
passage: 


Diese Deutung — wir wollen sie kurz als die logische Deutung 
bezeichnen, und das Princip, auf welches sie die Wahrschein- 
lichkeits-Rechnung basirt, als Princip des mangelnden Grundes 
— scheint auf den ersten Blick vollig zu befriedigen. [p. 6] 


‘References throughout this section are to the second edition of 1927. 
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In the second chapter, Article 3, we find a simple example of the rule of 
succession: one of two playing-cards lying face-down on a table is turned 
over and found to be black. The probability that the second card is black 
is, according to Poisson, 2/3, although one might ingenuously expect it to 
be 1/2. In a footnote von Kries mentions that the method illustrated here 
is based on the so-called Bayes’s principle. 

In Article 1 of the fourth chapter we find a further statement in favour 
of a subjective interpretation of probability, viz. 


jede Wahrscheinlichkeit ist subjectiv, der Ausdruck und die 
Folge unseres ungenauen oder unvollstandigen Wissens; eine 
, objective Wahrscheinlichkeit” dagegen ist ein Unding, eine 
contradictio in adjecto. [p. 77] 


The next pertinent comment appears in Chapter V, “Die Arten der nu- 
merischen Wahrscheinlichkeit.” Here von Kries considers the case of six 
dice, with die 7 having 2 faces marked with a “+” and 6 — 2 faces marked 
~ with a “0”. If a die is drawn at random and tossed three times yielding the 
sequence +, 0, +, what is the probability that the die chosen was the first, 
second, é&c.? The problem is solved in the usual way using 


das sogenannte Bayes’sche Princip, welches mit Recht als einer 
der wichtigsten Satze der Wahrscheinlichkeits-Theorie angese- 
hen wird. [p. 118] 


The more general expression p;a,/)~ pa is also given, and is described as 
the rule for deducing, as a consequence of Bayes’s Theorem, “der ,Ursache 
beobachteter Ereignisse’ ”. It is also stated that the use of Bayes’s Principle 
is not uncontroversial. The continuous analogue of the discrete expression 


given above, viz. 
b oe) 
/ p(2 — xo) dz , y(x — £9) dz , 
a —OO 


is found in Article 5. 

In Chapter VI, “Die Gewinnung und Begrundung von Wahrscheinlich- 
keits-Satzen” , we find the assertion that, if a large number (n+™m) of draws 
from an urn have yielded n black and m white balls, the probability that 
the next draw will yield a black ball is approximately n/(n +m). 


Man bezeichnet diese Verfahrungsweise als eine a posteriorische 
Wahrscheinlichkeits-Bestimmung. [p. 133] 


A work of this nature would of course be incomplete without mention 
of earlier work on the probability of testimony, and von Kries accordingly 
turns his attention to this problem, one that among all applications of prob- 
ability is “vielleicht die merkwirdigste” [p. 253], in the twelfth section of his 
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ninth chapter “Weitere Anwendungen der Wahrscheinlichkeits-Rechnung.” 
He argues that Laplace and Poisson reached their results from erroneous 
assumptions, an argument whose details we shall not give here, apart from 
mentioning that von Kries finds the absence of independence a major draw- 
back. 

The tenth and final chapter is devoted to the history of probability 
theory. Von Kries comments on the similarity between the “etwas schw- 
erfallige” [p. 267] definition of probability given by Bayes and that given 
a century earlier by Huygens in his De Rattociniis in Ludo Alea, the first 
proposition in which work reads 


51a vel b expectem, quorum utrumvis zequé facilé mihi obtingere 
possit, expectatio mea dicenda est valere (a + b)/2. [Bernoulli, 
1713, p. 4] 


(Huygens’s tract, written in Dutch in 1656, was published in a Latin version 
in 1657, the original only being printed in 1660. It is of no little interest, 
though too far from our present concern, to compare the above statement 
of the first proposition with the Dutch version, which reads 


Als ick gelijcke kans hebbe om a of 6 te hebben, dit is my so 
veel weerdt als (a + 6)/2. 


For further details see Hald [1990a, §§6.1 & 6.2.) 
Further reference to Bayes’s essay occurs later in this chapter, where von 
Kries brings in objective probability with the words 


Nach der von Bayes aufgestellten Regle pflegte man anzunehm- 
en, dass ehe eine Erfahrung vorliegt, jeder Wert einer (objec- 
tiven) Wahrscheinlichkeit gleich wahrscheinlich ist. [p. 277] 


This rule becomes a method that may be applied, without any further 
thought, as soon as we have equally probable cases. Thus if n trials result in 
m known outcomes, one can give a determinate (“bestimmte” ) probability 
that the probability of the outcome in question lies between (m/n)—6 and 
(m/n) +6, a probability that may well be large even for moderately small 
values of 6; and moreover one may, with some degree of accuracy, ascertain 
any probability (this latter phrase, “jene Wahrscheinlichkeit bestimmen” , 
is given within quotation marks in the original). And this rule can be used 
to determine the probability that 


bei einer Anzahl neuer Falle wieder die relative Haufigkeit des 
betreffenden Verlaufes in irgend welchen Grenzen liegen werde. 
[p. 278] 
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9.12 George Francis Hardy (1855-1914) 


In 1889 some remarks by Hardy were published in volume 227 of the I[n- 
surance Record. The substance of these comments was republished in an 
editorial note to Whittaker [1920], and it is to this note that reference is 
made in the present discussion. 

The correspondence in the Insurance Record arose in connexion with 
the following problem from Ackland and Hardy’s Graduated Exercises and 
Examples: 


If the experience of a given mortality table indicates that, out 
of 2000 persons alive at age 30, 29 die before attaining age 31, 
is it theoretically correct to say that the probability of a person 
age 30 dying before 31 = 29/2000? [p. 174] 


The answer given runs as follows: 


If out of (m+n) trials the result A has happened m times and 
the result B n times, then the probability that the next trial 
will produce the result A is strictly NPDES = ae or in 
the present case 30/2002 (De Morgan on Probabilities, chap. ii. 
p. 65). This result is, however, based upon the assumption that 
all values of the required probability are a priort equally likely, 
which cannot be said to be true with regard to the probabilities 
of death. [p. 175] 


Commenting on this, a reviewer finds the solution®? (m+ 1)/(m+n-+ 2) 
preferable to Bernoulli’s m/(m +n), but he finds the equally probable 
assumption a very obvious requirement in a mortality situation like this. 
To this latter comment Hardy takes exception, stating that 


As regards the probabilities of dying in a year, however, we 
know that the assumption is entirely incorrect at nearly all 
ages, and I fail to see how in a practical problem such as the 
constructing of a mortality table our results are to be improved 
by introducing an assumption known to be erroneous. [p. 176] 


As an illustration Hardy supposes that, of 1000 lives exposed to risk at age 
70, 900 survive to age 71 and 800 to age 72. Using Laplace’s formula one 
finds that 

P70 = 901/1002 » Pu = 801/902 ; 


and hence 2p7o = (901 x 801)/(1002 x 902), which is not equal to the ratio 
801/1002 obtained from the original data, 


a different result, but one which has just the same claims to 
acceptance as the former, as there is no special sacredness in 
the year as a measure of time. [p. 176] 
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Hardy concludes further that, while the usual formula (viz. m/(m + n)) 
may be open to a theoretical objection, no better formula can, in his opin- 
ion, be found. 

The reviewer replied promptly to Hardy’s letter, saying that, in his opin- 
ion, Hardy’s comments showed that the deduction of (m+ 1)/(m-+ n+ 2) 
from the formula P,p, /).P,.p, was wrong. He therefore rehearsed the 
usual deduction of the first of these formulae (in a mortality context), not- 
ing on the way that the most probable value, viz. z= m/(m-+n), of the 
facility of the event (or the probability of surviving a year), was given by 
maximizing «(1 — x)". 

Attention was then focused on Hardy’s more general problem, which the 
reviewer framed as follows: (m+n-+p) lives, each aged k, are to be observed 
for 2 years: of these, (m+ 7) survive the first year and m the second. From 
these observations it is required to find, for another person aged k years, 
the probabilities [p. 178] 


(i) that he will die in the first year; 

(ii) that he will die esate second year; 
(iii) that he will survive the first year; 
(iv) that he will survive the second year. 


Denoting by 7, and 7, the facilities of surviving a year at ages k and (k+1) 
respectively, the reviewer shows that 


Prie<amg<a+dzx&y<m <y+t+ dy] 
etn (]—2)Py™(1—y)” dz dy 
fo fg et" (1—2)Py™(1—y)” dz dy 


m+n+p+l1)! (m+nt+])! os ‘. 
( E (ment l) "(1 — 2)Py™(1 — y)" drdy. 


(m+n)! p! 


Multiplication of this result by (1 — x), x(1 — y), x and zy in turn and 
integration then yield the desired probabilities, viz. 


(i) (p+1)/(m+n+p+2) ; 

(ii) (m+n4+1)(n4+1)/(m4+n+ pt 2)(m4+n+4+2) ; 
(il) (m+n+1)/(m+n+p4 2) ; 

(iv) (m+n+4+1)(m+4+1)/(mM+n+ p+ 2)(m4+ nF 2). 


Hardy was unimpressed: he remarked firstly that the discrepant results 
he himself had given were based upon incompatible assumptions 
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in the first case all values of p;, and pz41 and in the second case 
all values of 9p, being assumed equally likely @ priori. [p. 180] 


Further, there was no reason for regarding one of these assumptions as 
better than the other. He stressed further the importance of a suitable 
choice of the P;, and suggested that they be chosen to®° 


form a series which may be fairly represented by a curve of the 
form z”(1—2x)*, where the relative values of r and s will depend 
on the most probable value of p, (or x), and their absolute 
values on the extent of our prior knowledge of that function. 
[p. 181] 


This results in 


1 1 
/ gmtrtth (1 — 2)" de /| 2™tr(1—2)t dr 
0 0 


=(m4+r4+1)/(m4+n4+r+s542). 


If one excludes from consideration the observations at age k, m and n 
become zero and one is left with the @ priori: best estimate of pz as 
ph = (r+ 1)/(r+s8+2). 

Although this concluded the discussion in the Insurance Record, the ques- 
tion was reopened by Whittaker in 1920, the sixth section of his paper 
being entitled Hardy’s parador®'. Whittaker finds that Hardy’s paradox 
arises from “a misapplication of the Bayes-Laplace theory” [p. 171], the 
correct application of which, in his opinion, runs as follows. Let H denote 
the hypothesis that the probability of a man aged 70 dying in his 71st year 
lies between x and x+dz, and that the probability of a man aged 70 dying 
in his 72nd year lies between y and y+ dy. Then 


| 1004100(] _ » — y)800 dar dy 
Pr[H | Hardy’s data] = Tf 21004100] — 2 — y)800 dz dy 


the integral being taken over the set {(z,y): x >0,y>0,2+y < ]}. 
The probability that “in a subsequent experience” [p. 172] a 70 year old 
man will die in his 71st year is found, in the usual manner, to be 101/1003, 
while the probability (also in a subsequent experience) that a man aged 70 
will die in his 72nd year is 101/1003. Thus 


P70 = 902/1003 , 2p7o = 1 — 202/1003 = 801/1008 . 
And since, in the usual manner, p7) = 801/902, it is clear that 


2P70 = P70 P71 


as required. Whittaker emphasizes that the same prior knowledge is to be 
used in determining these three quantities, and that this is the cause of 
Hardy’s paradox. 
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The editor in fact has noticed that the solution presented by Whittaker 
differs from that of the reviewer: 


The difference is due to their formulating in two different ways 
(both legitimate) the state of complete a priori ignorance. 


[p. 179] 


Discussion of a more general problem than that considered by Whittaker 
follows the latter’s paper. 

In his comments on Whittaker’s paper [1920] Lidstone states that he 
finds Hardy’s suggested prior “a highly valuable one” [p. 196]. Moreover, 
he regards Hardy’s solution as correct since the Bayes-Laplace theory is 
inapplicable “because there is no fixed basis on which it can be applied” 
[loc. cit.]. Stress is further laid on the fact that 


The change in the unit of time radically changes the resulting 
probability; yet mathematically one unit of time is as good as 
another. [p. 198] 


Thus an indeterminate constant (i.e. the time unit) is involved in the for- 
mula, and hence the formula itself yields an indeterminate value. This Lid- 
stone believes to be 


essentially the argument of Bing and Hardy, and I must confess 
I do not think that Professor Whittaker has made any serious 
attempt to meet it. [p. 198] 


Furthermore, 


The conclusion I reach is... that the formula is inapplicable 
where the “event” is capable of division in point of time or any 
other measurement. [p. 198] 


In his reply to the discussion Whittaker suggested essentially that com- 
mon sense would dictate the choice of time unit, a suggestion that Lidstone 
was loth to accept in view of its appeal to experience and its contradiction 
of the assumption of @ priort ignorance. The same sentiment was echoed 
in a letter®? by Nicholl. 


9.13 Joseph Louis Francois Bertrand 
(1822-1900) 


In 1889 Bertrand published®? his Calcul des Probabilités: the third edi- 
tion of 1972, to which reference will be made here, is a textually unal- 
tered reprint®* of the second edition of 1907. In this edition also appears 
Bertrand’s “Les lois du hasard” of 1884, a general essay on chance of no 
little interest. 
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Turning to the book itself, we find in Chapter I, entitled “Enumeération 
des chances”, the following definition: 


La probabilité d’un événement est le rapport du nombre des cas 
favorables au nombre total des cas possibles. Une condition est 
sous-entendue: tous les cas doivent étre également possibles. 


[p. 2] 


Many illustrative examples, on which we shall not spend time, follow (in- 
deed the whole work is amply illustrated). 

Let us pass on immediately to Chapter VII, “Probabilité des causes’. 
The term “causes” is defined at the outset thus: 


Les causes sont pour nous des accidents qui ont accompagné ou 
précédé un événement observé. Let mot n’implique pas qu’au 
sens philosophique l’événement soit un effet produit par la cause. 
[pp. 142-143] 


In §115 we find the following statement of the general problem (there is no 
mention of Bayes): 


Diverses causes /|, H2,...,H, ont pu produire un événement 
observé. Les probabilités de ces causes, lorsque le résultat n’était 
pas encore connu, étalent w ,,w2,...,W,. L’événement se pro- 
duit; la cause &;, lorsqu’on est certain que c’est elle qui agit, 
donne a |’événement la probabilité p;. Quelle est la probabilité 
de chacune des causes qui sont, on l’admet, les seules possibles” 


[p. 144] 


Bertrand shows that the solution is given by 
n 
Pr [E; | event] = pjw; [Suu 
1 


Several applications follow. The first of these is concerned with the com- 
position of an urn of yz balls (white or black in unknown proportion). Sup- 
pose that & draws, with replacement, have all resulted in white balls. If NV 
is a random variable denoting the number of white balls in the urn, then 


Pr[k white drawn|N = nj] = (n/p)* , 
and hence, under the assumption that all compositions of the urn are a 
priori equally possible, 


Pr[N = n|k white drawn] = __Pr[k white drawn|N =n] _ 


7 
\_ Pr[k white drawn|N = n] 


Ps) 
tl 
oO 


mad 


(nfuy® [SS (nfuy = nk [Sa 
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Thus the probability that all balls are white is 
Lb 
Pr[N = p|k white drawn] = w* / So nk. 
m=O 


The a prior: assumption is then relaxed in a particular numerical example, 
a further example illustrating the case of sampling without replacement. 

The second problem is concerned with the same situation, except that 
now the yu draws result in m white and n black balls. In this case, and again 
under an equally-likely assumption, Bertrand finds that the most probable 
composition of the urn is that which makes the probabilities of the drawing 
of white or black balls proportional to the number of times they appear. 

Denoting, in the above problem, the probability of drawing a white ball 
by x, Bertrand notes that “Chaque hypothése sur la valeur de z a une 
probabilité” [p. 149]. Starting with the function 2™(1— 2)" he deduces, in 
the usual sort of way, that the limiting probability is proportional to 


exp (—e?(m + n)/2pq) 


where p= m/(m+n) =1-—q and x = p—e. He also notes the difference 
between this result and the similar expression obtained in his study of 
Bernoulli’s Theorem, this difference being determined by what is known 
(i.e. p or n) in the two cases. 

As Sheynin [1994, §6] has noted, there is in fact a further difference 
between these two formulae. For if X ~ b(n, p), then the local de Moivre- 
Laplace Limit Theorem gives 


X — np 
/np(1 — p) 


where © denotes the standard Normal distribution function and p is sup- 
posed known. However the inverse Bernoulli Theorem yields 


m fox ates <4] ee) 


Pers ee <a — &(8) — O(a), 


Ve(n = 2)/n3 ~ 
where P is now unknown. Notice that 
Var(X) = npq while Var(P) = x(n—2z)/n?. 
Bertrand observes immediately that 


La formule précédente est déduite d’une hypotheése quise réalisera 
rarement. Toutes les probabilités désignées par zx ont, en général, 
a priort, des valeurs inégales [p. 151], 
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and follows this up with the following problem: from an urn containing N 
balls 4. drawings have resulted in m white and n black balls, where initially 
the probability of drawing either of these colours is 1/2. What is the most 
probable composition of the urn? Under the assumption that N is large it 
is shown that the solution is given by (N + 2m)/2(N +m-+n). Numerical 
variations on this theoretical theme follow®. 

Attention is next turned to the regularity in the ratio of male to fe- 
male births, reference being made to work by Nicolas Bernoulli, Buffon 
and Laplace, and also to some miscellaneous problems. 

In Article 136 Bertrand turns his attention to the probability of future 
events. As an example he considers the drawing of balls from an urn under 
the assumption®® that “Toutes les suppositions sont également possibles” 
[p. 172]. If 4 draws have resulted in m white and n black balls then ’tis 
found, in the usual way, that the probability that the (w+ 1)th draw will 
yield a white ball is 

m+ 1 
rarer 
Turning to applications of this rule, Bertrand writes 


Les applications faites de cette formule ont été presque toutes 
sans fondement [p. 173], 


a sentiment that he illustrates by the example of the sun’s rising tomorrow, 
given that it has risen daily for 6,000 years. Assimilating this to the repeated 
drawing of white balls from an urn, he finds that the probability of one 
further white given 2191500 white is 0.999999543: “Est-il besoin d’insister 
sur l’insignificance d’un tel calcul?” [p. 174]. In the introduction we in fact 
find the further comment on the equating of these two cases: 


L’assimilation n’est pas permise: |’une des probabilités est ob- 
jective, autre subjective. [p. xix] 


Chapter XIII, “Probabilités des décisions” , contains only one article, en- 
titled “Résumé critique des tentatives faites pour appliquer le Calcul des 
probabilités aux décisions judiciaires.” The description is accurate: Con- 
dorcet, Laplace, Poisson and Cournot all come under the spotlight. Thus, 
writing of Condorcet’s /ssai Bertrand says 


Aucun de ses principes n’est acceptable, aucune de ses conclu- 
sions n’approche de la vérité. [p. 319] 


Successors to Condorcet, while recognizing the insufficiency of his formulae, 
were not able to provide anything better: indeed 


Laplace a rejeté les résultats de Condorcet, Poisson n’a pas 
accepté ceux de Laplace; ni l’un ni l’autre n’a pu soumettre 
au calcul ce qui y échappe essentiellement: les chances d’erreur 
d’un esprit plus ou moins éclairé, devant des faits mal connus 
et des droits imparfaitement définis. [pp. 319-320] 
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Further, 


Ni Cournot ni Poisson n’ont commis la plus petite faute comme 
géometres; ils traduisent rigoureusement leurs hypothéses. Mais 
les hypotheses n’ont pas le moindre rapport avec la situation 
d’un accusé devant les juges. [p. 326] 


The criticism is just and reasonable®’, and the conclusion may perhaps be 
drawn that such matters are perhaps not completely suited to probabilistic 
examination. 


9.14 George Chrystal (1851-1911) 


The substance of an address delivered by Chrystal before the Actuarial 
Society of Edinburgh was published in 1891 in the Transactions of that 
body under the title “On some fundamental principles in the theory of 
probability”. Following on the pioneering work of Venn, Chrystal proposed 
in this paper merely 


to state a little more clearly, from the mathematical point of 
view, the reductio ad absurdum of the rules of Inverse Proba- 


bility. [p .421] 


On “the view of Probability which has been gaining ground of recent 
years” [p. 422], by which is no doubt meant the frequency theory espoused 
by Venn, Chrystal finds Laplace erring (if not sinning) in basing probability 


ultimately on a mere condition of the human mind, instead of 
resting it ultimately upon human experience of the objective 
world [p. 422], 


a position adopted by some of his (i.e. Laplace’s) followers, especially de 
Morgan®®. While Boole seemed to be trying to break this stranglehold®?, 
the grip of the past was perhaps too strong, and it was left to Venn, with 
his concept of the probability of a series, to fill the lacuna in Laplace’s 
theory. 


Chrystal defines the probability (or chance) of an event as follows”: 
If, on taking any very large number N out of a series of cases in 
which an event A is in question, A happens on pN occasions, 


the probability of the event A is said to be p. [p. 426] 


He stresses that “probability is not an attribute of any particular event 
happening on any particular occasion”” [p. 426], and adds to this caveat 
the corollary that 


no information of any value regarding the probability of an 
event can be gathered from one or from a small number of 
observations. [p. 426] 
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The sixth and seventh principles of Laplace’s E'ssaz, those concerned with 
inverse probability, are next recalled, and the following example, also due 
to Laplace, considered: two drawings are made, with replacement, from an 
urn containing two balls, each of which may be either black or white. If 
these two draws both yield white, what is the probability that the next 
ball to be drawn will also be white? Chrystal draws attention to Laplace’s 
assumption that the two possible hypotheses as to the composition of the 
urn are equally likely, pointing out that this is not necessarily the case. 
A reading of Laplace’s solution in the £ssaz shows that the desideratum 
Pr [Ws | Wi, W2] is obtained from 


2 
Pr [Ws | Wi, We] = D0 Pr [Hi | Wi, Wo] Pr [Ws | Ai), 
| 


a result that obtains under the assumption that, for each 7 € {1, 2}, 
Pr [W,W.Ws | H;] = Pr [WW | H;| Pr [Ws | HT; : 


The answer is 9/10. 

One of what Chrystal terms “the grand results of this method” [p. 428] 
is the rule of succession. Reference is made to Laplace’s use of this result in 
connexion with the sun’s rising (comparison with Buffon’s treatment being 
drawn) and to a simple example from Crofton [1885, p. 774], which receives 
scornful treatment at Chrystal’s hands’?. The same hands now turn to the 
manipulation of several problems: we shall consider them seriatim. 


Problem I. Given a bag containing three balls, each of which may be 
black or white, to find the probability of drawing a black ball. 
[p. 429] 


Chrystal notes that the problem, as stated, is quite indeterminate’?, and 
stresses the need for the definition of an appropriate “series” for its solu- 
tion. Two hypotheses are suggested, viz. 


(A) all numbers of white balls will occur equally often in the long-run; 
(B) each ball will be black or white equally often in the long-run. 
Under these assumptions the four possible constitutions of the bag 
UW, B)} = {(0, 3), (1, 2), (2, 1), (3, 0)} 
will occur in a large number WN of trials with frequencies 
(A) zN, zN, iN, ZN, and 
(B) gN, 3N, 3N, iN | 


respectively. Under either hypothesis the desired probability is 1/2. 
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Problem II. Given a bag which contains one white ball and two others, 
each of which may be either white or black, what is the probability 
of drawing a white ball? [p. 430] 


In this case the possible constitutions 


{(W, B)} = {01 2), (2,1), (3, 0)} 
of the urn are considered subject to the hypotheses 
(A) of the unknown balls 0, 1 or 2 white are equally likely, and 
(B) each ball in the bag is equally likely to be black or white. 


In a large number N of trials the possible constitutions will then occur 
with frequencies 


(A) £QN, aN, 5, and 
(B) 2N, SN, =N, 
the required probability being 2/3 or 4/7 respectively. 


Problem III. Given a bag which contains three balls. A ball is drawn, 
found to be white, and returned to the bag: calculate the probability 
of drawing a white ball on another trial. [p. 431] 


This is merely Problem II in an alternative form. 


Problem IV. A white ball having been drawn from a bag containing 
three, required the probabilities that the bag from which it was drawn 
contained — 


{(W, B)} = {(3, 0), (2, 1), (1,2)} = (1%, 2°, 3°} 
respectively. [p. 431] (notation altered) 


Here some assumption as to the series is again required; but Chrystal first 
finds it necessary (at this stage!) to explain the meaning of the word “prob- 
ability”: 


Let a large number M of bags, each of which is filled with one 
white ball and two others, the occurrence of which is regulated 
in some given or supposed way, say on Hypothesis (A) or Hy- 
pothesis (B) as above, required the numbers pM, qM, rM of 
these cases in which when a white ball was drawn it came from 
bags having the constitutions 1°, 2°, 3°, respectively. 

[pp. 431-432] 


He also emphasizes that conditional probabilities are required. Under the 
respective assumptions of initial frequencies 
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(A) ZN, 4N, ZN and 
(B) ZN, 2N, 3N, 
Chrystal finds that (p, q,r) = (3/6, 2/6, 1/6) and (1/4, 2/4, 1/4). 


Problem V. From a bag containing three balls, each of which is white 
or black, two are drawn in succession, the first being replaced, to 
calculate the probability that whenever the first 1s white the second 
is white also. [p. 432] 


Following on from Problem IV the solutions 7/9 and 2/3 emerge under (A) 
and (B). 

With these results as background material Chrystal turns his attention 
to Crofton’s demonstration of Laplace’s principles of inverse probability 
with particular reference to the following question: 


suppose an urn to contain three balls which are white or black; 
one is drawn and found to be white. It is replaced in the urn 
and a fresh drawing made; find the chance that the ball drawn 
is white. [p. 434] 


Crofton’s solution of 7; = 7/9 is stated by Chrystal to be the solution 
of (one case of) Problem V, rather than of the problem initially posed, 
inasmuch as Crofton | 


deceives himself into believing that he has solved his problem 
by the merely arbitrary statement, that the probability 7, is 
the a posteriori (or modified probability) of the cause C,. It 
is, in reality, merely the probability that, when the event has 
happened, it happened from the cause C;, which is a totally 
different thing. [p. 434] 


While one must agree that the probability found is in fact a conditional one, 
it might well be queried whether Crofton thought he had found anything 
else. 

As a variation of the three-ball problem, and to illustrate the absurdity 
of the rules of inverse probability, Chrystal considers the following example: 


A bag contains three balls, each of which is either white or 
black, all possible numbers of white being equally likely. Two 
at once are drawn at random and prove to be white: what is 
the chance that all the balls are white? [p. 435] 


Chrystal’s “common sense” solution runs as follows: 


Any one who knows the definition of mathematical probability, 
and who considers this question apart from the Inverse Rule, 
will not hesitate for a moment to say that the chance is 1/2; 
that is to say, that the third ball is just as likely to be white as 
black. For there are four possible constitutions of the bag:— 
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each of which, we are told, occurs equally often in the long-run, 
and among those cases there are two (1° and 2°) in which there 
are two white balls, and among these the case in which there 
are three white occurs in the long-run just as often as the case 
in which there are only two. [p. 435] 


Now this is a very curious solution: since there are initially more white 
balls in 1° than in 2°, one might well expect the answer to reflect this, and 
indeed that is just what emerges when one applies the inverse rules. For 
under these rules, argues Chrystal, there are only two possible constitu- 
tions of the bag, viz. 1° and 2°, each having a prior: probability 1/2. The 
event consisting of the drawing of two white balls has for its probability 
under these hypotheses the values 1 and 1/3, and hence the a posteriori 
probabilities of 1° and 2° are 3/4 and 1/4, a result that Chrystal finds 
ridiculous”. 

If we look at the argument more closely, we find that Chrystal is sug- 
gesting the use of the hypothesis of an initial uniform distribution 


Pr[X =k] =1/4, k € {0,1,2,3} 


(where X denotes the number of white balls in the bag) rather than the 
hypothesis (B) he used before, in which X ~ 6(3, 1/2). If we denote by C; 
the ith constitution and by F the drawing of two white balls, then 


Prf[F# | Ci] = 1 ; Pr/ FE | Co] = 1/38 


Pr(C, | E] = 3/4 , Pr[Co| BE} =1/4. 


Where Chrystal errs is in supposing that, after &, the constitutions 1° and 
2° are equally probable with chance 1/2. 

Chrystal argues further that the fallacy embodied in the inverse rules 
consists in the confusion of what we might write as Pr [C;] with Pr [C; | EF], 
a confusion that in turn arises 


from neglect of the consideration that a probability is not un- 
ambiguously defined until the “series” of the “event” to which 
it relates has been given. [p. 436] 


He suggests further that Laplace’s two principles be written in the form 
Pr [C; | E] = Pr[F | Ci] Pr [Ci] /% Pr [FE | C;] Pr [C;] 
1 


and 


= (Pi [E | Cy]? Pr [Ci] / Pr [E | Ci] Pr[Ci], 
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where II is the probability of one further occurrence of & after it has oc- 
curred once. To these formulations no exception can of course be taken, 
and one may be sure that Chrystal’s interpretation is indeed that intended 
by Laplace. 

As a further example of the unreasonableness of inverse probability 
Chrystal considers the following situation: 


A bag contains five balls which are known to be either all black 
or all white — and both these are equally probable. A white 
ball is dropped into the bag, and then a ball is drawn out at 
random and found to be white. What is now the chance that 
the original balls were all white? [p. 437] 


Chrystal’s answer is that the chance is still 1/2, unlike the solution obtained 
by Whitworth [1878, p. 151] of 6/7. This latter answer is interpreted by 
Chrystal as follows: 


if you were to drop a ball among the five a great many times, 
and draw one out again, then in about 6/7ths of the times that 
you got a white ball you would get it from a bag in which 
all the balls are white. About this there is nothing mysterious 
whatever; but it is not the meaning of the question as it stands. 


[p. 437] 


The distinction is clear: Chrystal is concerned with an absolute and Whit- 
worth with a conditional probability. 
The theory of inverse probability is finally dismissed as follows: 


both from the point of view of practical common-sense, and 
from the point of view of logic, the two so-called laws of Inverse 
Probability are a useless appendage to the first principles of the 
Theory of Probability, if indeed they be not a flat contradiction 
of those very principles. [p. 438] 


Chrystal’s attack’ on inverse probability (one might even refer to it as a 
diatribe) did not pass unchallenged. In 1920, in a paper entitled “On some 
disputed questions of probability”, E.T. Whittaker (1873-1956) considers 
the variation of the three-ball problem discussed by Chrystal, changing 
it, to intensify the effects, to a bag containing 1,000,001 balls, each either 
white or black, and all possible numbers of white balls equally likely a 
priorz. If 1,000,000 balls are drawn, and all are found to be white, there is 
clearly an overwhelming probability that the remaining ball is also white. 
Whittaker presents both a “common-sense” argument and a frequency one 
to confute Chrystal, and argues further that considerations analogous to 
those presented by the latter are correctly applied in the following instance: 


An urn A contains a very large number of white balls, and the 
same number of black balls; from it n balls are drawn at random 
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and placed in a second urn B without being examined. From B 
(n — p) balls are drawn (without being replaced) and are found 
to be all white. What is the probability that the next ball drawn 
from B will be white? [p. 167] 


_ Arguing from the assumption that all constitutions of B are equally likely, 
Whittaker deduces from “Bayes’s formula” that the required probability is 
es 

He also deduces, in the usual manner, the formula 


[ e—ayto(a)de / anc a9 8H(2)de (16) 


An unusual facet of his derivation is the interpretation of this as the prob- 
ability that a person aged s will die before attaining age (s+ 1), given that 
of (m+n) persons alive at age s, (m+ 1) die before attaining age (s + 1), 
with v(x) dz denoting the probability that the facility lies between x and 
z+ dz. As a limiting case it is supposed that v(z) = 1, in which case (16) 
reduces to 

(m+1)/(m+n4 2). (17) 


Since, however, it is almost inconceivable that anybody could 
be in the position of having no a prior: knowledge whatever 
regarding mortality, the formula [17] has no practical value; the 
really important formula is [16]. [pp. 169-170] 


He suggests too that, as an approximation, one might well use m/(m +n). 

In the discussion of Whittaker’s paper, J.R. Armstrong suggests that 
Chrystal’s paper should not be viewed merely as an attack on the Bayes- 
Laplace theory. Rather, its aims are threefold: (i) a reiteration of Venn’s 
criticism of mathematical probability as a calculus of belief, (i1) a criticism 
of certain (then current) interpretations of results obtained by a cavalier 
application of Bayes’s formula, and (111) a protest against the use of the 
formula where such use is illegitimate. As regards (1) Armstrong sides with 
Venn and Chrystal; as far as (iii) is concerned he notes that such enlivening 
problems only become amenable to the Bayes-Laplace theory “by a process 
of abstraction that deprives them of all their specific content” [p. 199], 
while in connexion with (ii) he in the main stresses the importance of a 
clear distinction between absolute and conditional probabilities. 

This last point is also stressed by W.L. Thomson, in the discussion, while 
the president, A.E. Sprague, in his concluding speech said 


I speak as an old pupil of the late Professor Chrystal, and with 
great diffidence and great respect, but I am sorry to say that 
I cannot make out from his paper precisely what his meaning 
was, and I think that his arguments as stated therein are open 
to criticism in various directions. [p. 202] 
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My own view is that Professor Whittaker’s guns in the contest 
have outclassed Professor Chrystal’s and Mr. Thomson’s. 
[p. 203] 


In his reply to the discussion Whittaker defends his opinions against 
Armstrong and Thomson’s defence of Chrystal, stressing that if an event 
FE can occur only as a result of one and only one of the causes Aj, Ao,..., 
then to say that “when E happens, it happens as a result of A,” is surely 
equivalent to saying that A; exists. 

Hard on the heels of Whittaker’s paper followed one by John Govan, en- 
titled “The theory of inverse probability, with special reference to Professor 
Chrystal’s paper ‘On some fundamental principles in the theory of proba- 
bility.” ”. This paper, although not published until 1920, had in fact been 
read before the Actuarial Society of Edinburgh in 1893: it was apparently 
published at Whittaker’s suggestion. 

Govan first considers the variation on the three-ball problem discussed 
by Chrystal. Under a long-run frequency interpretation it is argued that 
the desired answer is indeed 3/4. Chrystal’s five-ball problem is examined, 
and Whitworth’s solution of 6/7 is confirmed. Furthermore, the usual form 
of the rule of succession (i.e. (m+ 1)/(m+ n+ 2)) is derived in the case 
of sampling from an urn of indefinitely large size when the proportion p of 
white to black balls in the urn is unknown, but is uniformly distributed. 
Govan extends this example to the case in which (m+ n) draws that re- 
sulted in m white and n black balls were preceded by (m'+n') draws 
yielding m’ white and n’ black balls. In this case the @ prior: probability 
of p (before the (m+n) draws) is no longer dp but 

m' n! 1)! me / 
(mene (1—p)" dp, 
and the probability that the next draw will yield a white ball is found, as 
expected, to be 
m+m-+1 
m+m+ni+n+2- 

Exception is in fact taken to most, if not all, of Chrystal’s arguments. 
Thus, for example, in discussing Chrystal’s Problem II Govan criticizes the 
assumption of hypothesis (B) that each ball is equally likely to be white or 
black: for how, he says, “can we suppose that, when we are told that one 
ball is white?” [p. 220]. If one supposes rather that one ball is white and 
each of the remaining two is equally likely to be white or black, then the 
possible constitutions arise with relative frequencies 1 : 2: 1 rather than 
Chrystal’s 3: 3:1. A generalization of this problem is also provided. | 


The fundamental error which vitiates nearly every conclusion in 
Professor Chrystal’s paper, is his denial of the fact that (in the 
class of problems here discussed) the result of every trial modi- 
fies our data, or series, to use his own term. ... In Problem III. 
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for instance (Hypothesis (A)), the series as at first given puts 
the four possible constitutions on an equal footing. The result of 
the first trial makes the constitution three black impossible, but 
Professor Chrystal will not admit that, just as three black has 
become impossible, so three white has become more probable 
than, say, one white and two black. [p. 223] 


Govan next turns his attention to the following general problem: 


p is the ratio of white, g of black (p+q = 1), in an urn containing 
an indefinitely large number M of balls. N balls are drawn at 
random, N being a number very great in itself, but insignificant 
as compared with M. The proportion of white among the balls 
drawn will be p. [p. 223] 


To prove this Govan proceeds as follows: since M is large and N negligible 
as compared with M, the probability that the sample contains r white and 


N —r black balls, viz. 
pM qM a 
r N-r N/}’ 
N : 
( )rre 
i 


This expression being maximized by the setting of r = pN (approximately), 
the probability of the most probable ratio, p, in the drawing 1s 


N npN gn 
@ e a 
an expression that use of the Stirling-de Moivre formula reduces to 
P= 1 / \/2apqN | 


It follows further that the probability of a deviation of +z in the number 
of white balls drawn is 


reduces to 


W(x) = Pexp(—x?/(2pqN)), 
and hence the expectation of the deviation from the most probable number, 
oN, of white balls will be approximately 


pN GN 
| zw(x)dx+ | zw(x)dz , 
0 0 
which is easily found to be 


V pqN/2m (2 — exp(—pN/2q) — exp(—qN/2p)] . 
For large N this behaves like \/2pqN/7, and it follows that the ratio of this 


to N tends to zero as N tends to infinity, as asserted in the proposition. 
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915 William Matthew Makeham 
(1826-1891) 


In 1892 Makeham published, in volume 29 (1891) of the Journal of the 
Institute of Actuartes, a paper entitled “On the Theory of Inverse Proba- 
bilities.” The paper consists of five sections. 

In the first section Makeham declares his intent to use the word “chance” 
as signifying “a way of happening”, a meaning that he finds in Lubbock 
and Drinkwater-Bethune [c.1830, 5]. The term is to be distinguished from 
probability, about which the following is recorded: 


We cannot be said to be ignorant of the probability of a given 
event, for the term “probability” has no reference to the chances 
(for and against) actually existing, but only to our knowledge 
of them. The probability, therefore, can always be determined 
by calculation, provided, of course, that we possess the skill 
necessary for the purpose. [p. 243] 


In this same section Makeham cites Laplace’s “well-known formula in in- 
verse probabilities” , viz. 


(m+1)/(m+n+2), 
a formula that is deduced under the following fundamental assumptions: 


first, that the ratio of chances, for and against, may have any 
value from 0 to 1; and, secondly, that all values within those 
limits are a priori equally probable. [p. 245] 


In an attempt to counter objections raised by G.F. Hardy as to the appli- 
cability of this formula to assurance, Makeham proposes to generalize the 
result. This generalization is undertaken in Section 2, the following situa- 
tion being considered: suppose that several urns are filled by withdrawing 
balls randomly from an urn containing a large number of white and black 
balls, the (known) ratio of white to total number being p and that of black 
to total number being g. Suppose further that in a particular filled urn the 
ratio of white to black balls is as p’ : q’. Makeham now gives the following 
definition: 


The quantity denoted by p is the limit towards which the un- 
known ratio p’ (in any particular urn) necessarily tends more 
and more to approximate as the number of balls contained in 
the urn is increased. [p. 246] 


The ratio p is then the antecedent, or a priori, probability of drawing a 
white ball from any urn; moreover, it is what Laplace terms “le milieu de 
probabilité” not only of all possible values of p’ in a specific urn, but also 
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of the several values of p’ actually existing in the different urns”. 

Now to the problem in hand: suppose that (m+n) draws (with replace- 
ment) have been made from a specific urn, m balls being white. What is 
the probability pm,» of obtaining a white ball on the next draw? Makeham 
states further that 


p represents the a priori probability (before any drawings have 
yet been made); while pp» represents the a posterior: proba- 
bility (after the fact that m white and n black balls have been 
drawn has become known to the observer). [pp. 246-247] 


This may seem slightly in conflict with the earlier definition of pmjn (after 
all, is a predictive probability the same thing as a posterior probability”): 
it seems, however, from what follows that pm, ts intended in a predictive 
sense. 

Two postulates are established for the solution of this question [p. 247], 
Viz. 


Postulate 1. If p = m/(m+n), then pmwn is also equal to m/(m + n), 
and to p. 


Postulate 2. In all other cases pm, will necessarily lie between m/(m+n) 
and p. 


In defence of the first postulate, Makeham argues that if p = m/(m+7n), 
the result of the (m+n) trials provides no reason for altering the estimate 
of the probability. As regards the second, since p is the milzeu de probabilité 
of the possible values of p’ in the urn concerned, if m/(m-+n) < p it is 
probably less than p’, and hence m/(m+n) < Pmwn. Further, since p is the 
milieu de probabilité of the values of p’ in the different urns, p’ is probably 
less than p (in the urn in question) if m/(m-+n) < p, and so pmn <p. A 
similar argument may be applied if m/(m-+n) > p, in which case it follows 
that p < pm < m/(m-+n). 

It now follows that pmn may be supposed to be given by’ 


Pmn =(m-+rp)/(m+n+r) (18) 
for some r > 0. This may alternatively be written 
Pm = [m/(m +n) + ap|/(1+ a), 


where a = r/(m+n). Now r may be shown to be independent of m + n, 
though it may well be a function of p — say y(p). On interchanging m 
and n, and replacing p by q, we find that the probability qn,m of drawing 
a black ball on the next draw is 


Gnym = [n+ v(q)aq]/[m+ n+ 9(q)] - 
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But since gn,m = 1—pm,n it follows that r either must be constant or must 
be symmetric in p and q. 

In order to examine Makeham’s expression for pm» given in (18) let us 
recall the discussion in §4.5. Our aim is to find a (prior) density f(x;7r, p) 
on [0, 1] such that 


ge (19) 
mtn+r_ 


1 
f f(e;r,p)e™**(1 — 2)” dz 
0 
i 
J f(a;r, p)an(1 — 2)" dz 
0 
If m = n= 0 then (19) reduces to 


1 1 
[ tevrpyede / | IG pdt] rp/r =P, 
0 0 


and hence, since f is a density, 


1 
i  f(esr,p) de =p; (20) 


that is, the mean of f is p. 
Next, for m = 1 and n = 0, we obtain, from (19), 


1 1 
i; flair) 2? de / f(2;7r,p)edz= aed 9 
0 0 


the combination of this result with (20) yielding 


1+rp 
1+r P 


1 
/ x? f(x; r,p)dz = 
0 


Thus 
2 


1 1 
Variance = | xz’ f(x;r,p) dz — (/ x f(a;7r,p) ic) 
0 0 


pl —p)/ +r). (21) 
Similarly it follows that 


m-+rp 
m+r 


1 
Hm+1 =| et (erp) da = Hm : 
0 


Suppose we determine on a beta prior density g(x) « 2°(1 — x)’. On 
equating the mean of this density to the value p in (20) we get 


p= | Md aod) Caer). (22) 
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Similarly, using (21) and the expression for the variance of a beta distribu- 
tion, one finds that 


p(1 — p) (a+ 1)(B+1) 


ltr (a+ B+2)%(a+H+3) (23) 


From (22) and (28) it then follows that 
a=rp—1, PB=rl-p)-l. 


Alternatively, if the supposition preceding (22) is not to one’s liking, one 
may note that 
i, = (m—1+rp)...(.+ rp) _ p 
m (m—-1+r)...(1+r) 
(m—1+rp)...(.+rp)(rp) 
(m-—1l+r)...l+r)r 


(m+ rp) 7 I(r) 
Fonepa) Lp) i@ =p) 


which is recognizable as the mth moment about the origin of a random 
variable with density proportional to 27?~+(1 — ao 

Thus the choice of prior f(a;r,p) « #?-!(1 — x)"C-P)-! yields the 
desired result 


1 
grP-i a r(1—p)-1l,m+1 1-2)" dz 
fara) (a2)"de 


mtn+tr- 


1 
fee? teem Ft eC ge Pode 
0 


The following special cases are worth noting: 


0 | — | Haldane: x~'(1—2)7! 
1 | 1/2 | Jeffreys-Perks: 2—1/2(1 — 2)71/? 
2} 1/2 | uniform 


In Section 3 the general formula (18) is compared with Laplace’s formula, 
the expressions of course coinciding for p = 1/2 and r = 2. Then, says 
Makeham, 
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For convenience of calculation I propose, for the present, to 
assume a mean value in all cases for the function r, whatever 
may be the value of p, for which purpose it is evident that the 
mean value in question must be taken = 2. [p. 249] 


His choice of r yields the (approximate) generalized formula 
(m + 2p)/(m+n-+ 2), 


and the values thus obtained for n = 0, p = 0.01 and 0.99 are compared with 
those given by Laplace’s formula and by the “ordinary formula” m/(m+n) 
(a result Makeham attributes to James Bernoulli). 

In his fourth section Makeham turns to the four “Principes” given by 
Laplace in his Théorie analytique des probabilités, each of which is illus- 
trated by a coin-tossing example. Of note is the following remark: 


we shall find that, without exception, the application of each 
one of Laplace’s four fundamental principles (covering, as they 
do, the whole field of the doctrine of probabilities) involves an 
application of the inverse theory. [p. 445] 


The first principle may be formulated as follows: 
Pr[E & F] = Pr[E]Pr[F | £}. 


Applying this principle (or more accurately an extension of it) to repeated 
tosses of a coin, Makeham deduces that the probability of getting “heads” 


m—1 

m times in succession is p || p;, where p is the probability of “heads” on 
a 

the first throw and 


p; = Pr[“heads” on (j + 1)th trial | “heads” on preceding j trials]. 
Makeham now asserts that the form of p; will vary 


according to the observer’s knowledge of the “actual ratio of 
chances”, that is, of the inherent tendency of the coin to fall 
head or tail [p. 446], 


and three cases are considered in support of this assertion: 
(i) Cy :Cr::(1+w):(1—-w) , where C, denotes the chances for x; 
(ii) Pe = (2 + rp)/(2 +7); 


(ii) either Cy : Cr ::(1+w):(1—w) or Crp: Cy ::(1+w): (1 — vw), 
and these two suppositions are equally likely a prior. 
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m—1 
The value of p [| p; is found in these cases to be 
j=l 


1+w a 1 ee 1+w de l—-w\”™ 
2 7" m+1’ 2 2 2 
respectively. 
The second principle may be given thus: if F' denotes a future event and 
F an observed event, and if Pr[# & F] and Pr [E] are determined a priori, 


then 
Pr[F | #] = Pr[# & Fl / Pr [EZ]. 


In the notation adopted in the discussion of the first principle, one has 
Pr [m’ heads | m heads] = pm Pm4i---Pm+m'-1 - 


For m' = 1 this reduces to (m+rp)/(m+r), if the general formula is used, 
or (m+ 1)/(m + 2), if Laplace’s formula is used. 

The third principle runs as follows: suppose that FE’, an observed event, 
can occur in conjunction with one (and only one) of n different causes 


Givens Cas Then 
Pr [C; | E] > Pr[C; | E] - Pr{E | C;] : Pr[E | Cy 


and 


a we ee 


The example used to illustrate this principle shows that an equi-probable 
assumption is needed. 
The fourth principle states that, for a future event F, 


DE eg ea Ere Ga 


Here Makeham shows that the probability of a head given m heads in 
succession 1s 


2 {[ + mye + wy] +)" + wh. 


Some criticism of an example given by Laplace then follows?, together 
with the astute observation that 


‘This example is concerned with the drawing, with replacement, of balls from 
an urn of three balls, each of which is either white or black: m of these drawings 
yield m white balls. To determine the a posteriori probabilities of the possible 
constitutions of the urn, Makeham suggests that one should consider the prior 
probabilities as 1/8,3/8,3/8 and 1/8. 
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for the correct solution of these inverse problems, everything de- 
pends upon the proper determination of the elementary values, 
that is, upon the correct analysis of the elementary hypotheses 
of equal antecedent probability. [p. 456] 


The final section of the paper is devoted to a brief application of some of 
the preceding results to the problem of the use of observed mortality rates 
in assurance. 

Some comment on Makeham’s work seems necessary. Firstly, Laplace’s 
formula, viz. (m + 1)/(m+ n+ 2), applies only if the number of balls in 
the urn is infinite®°®, a condition not made explicit by Makeham. Secondly, 
the equating of Laplace’s expression to (18) results in 


r= r(p) =(m—n)/[g(m+ 1) — p(nt 1)]. (24) 


Note that r(p) = r(q) implies that p = 1/2, in which case r = 2. But while 
the pair (r = 2, p = 1/2) certainly reduces Makeham’s formula to Laplace’s, 
so will many others: the condition r(p) = r(q) ensures uniqueness. 

It follows from (24) that, since 0 < p< 1, r must satisfy 


O<(n—m)/(n+1)<r, 


which not only sets some extra condition on the permissible values of r, 
but also requires that n exceed m (this requirement is in fact not met in 
the numerical example used by Makeham, in which n = 0). 

One might also note that Laplace’s formula may be written in the form 


alm/(m + n)] + (1 — @)(1/2) 
where a = (m+ n)/(m+n-+ 2). Now Makeham states that 


the supposed value of p, or the antecedent probability, is 1/2 in 
Laplace’s investigation. [p. 249] 


It is perhaps difficult to reconcile this with the fundamental assumptions 
stated earlier in this section. Of course, if p is uniformly distributed over 
the unit interval, a number of statistics of p (e.g. its mean, median and 
mode) take the value 1/2; and in view of Makeham’s assertion that p is “le 
milieu de probabilité” of the possible values of p’ in a given urn, 


that is to say, p is a quantity such that the true value of p’ in 
any particular urn is just as likely to be above as below it 
[p. 246], 


we might well consider the median as the appropriate statistic. In this case 
p should be replaced by the median value m obtained from tables of the 
incomplete beta-function ratio [,(a, 8). 

A further short note by Makeham, “On a problem in probabilities” , ap- 
peared in the same volume of the Journal of the Institute of Actuaries. This 
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note was devoted to the following problem: consider four urns of respective 
composition three white balls, two white and one black, one white and two 
black, three black balls. One of these urns having been chosen at random, 
m draws (with replacement) are made, and all result in white balls. Find 
the a posteriori probability that urn 1, 2,3 or 4 was chosen. The problem is 
solved in the usual way: all we might note here is Makeham’s justification 
of the choice of a uniform prior, viz. 


As the urn chosen may be any one of the four, it is evident that, 
a priori, there is precisely the same chance in favour of each of 
the four hypotheses in question. We have here, then, necessarily, 
the identical condition gratuitously assumed by Laplace in the 
solution of his well-known problem. [p. 475] 


(The reference is to p. 183 of the Théorie analytique des probabilités (first 
edition): it may be found on p. 185 of the @uvres complétes edition of 
1886.) 

Makeham’s theory did not go unchallenged: in 1892 Edward L. Stabler 
published a paper in which he presented 


some considerations which I think will show that this formula 
is not suitable for any application. |p. 240] 


He agrees with Makeham’s formula 
Pmn =(m+rp)/(m+n+r), 
but states that 


in this form the formula gives no more information as to the 
probability desired than was already evident from the nature of 
the case. [p. 240] 


Stabler takes exception to Makeham’s “proof” that 


ris “some undetermined constant independent of m+n”, and 
either independent of p or “not affected by the substitution of 
q, or 1 — p, for p.” [p. 240] 


his counter-examples showing (i) that r may be affected by the replacement 
of g by p, and (11) that r may depend on m+n. 

Turning his attention to the general case of sampling from an urn of NV 
(finite) white or black balls, in which urn the initial probability of drawing 
a white ball is known to be p, Stabler shows that the probability p,,,, that 
the (m+n -+ 1)th draw will yield a white ball, after (m+n) draws have 
resulted in m white and n black balls, is given by 

N 
() pX-5(1 x p)>(N ae gy tg 


) 


N 3 (0) pN-8(1 = p)9(N — 5) 5" 
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which is a generalization of an expression given earlier by ‘Terrot. 

Further criticism was raised by John Govan [1920]. Govan concentrated 
on Makeham’s “urn and balls” example, agreeing with Chrystal’s asser- 
tion that the problem, as stated, is simply indefinite, and he argued that 
Makeham had made a serious error in his solution. For, contra Makeham, 


if we know that each individual ball is equally likely to be white 
or black, we cannot know in addition that one ball is certainly 
black (unless we know further that one ball is certainly white), 
inasmuch as the one condition is incompatible with the other. 
[p. 228] 


Govan suggested that the following meaning might be attached to Make- 
ham’s problem: suppose we have Ge bags, each containing r white and 
(3 —r) black balls. Each bag is taken N times, and, on each of these occa- 
sions, (s +t) draws are made (with replacement) from that bag. Then the 
number of times that we get s white and ¢ black balls is approximately 


3\ /s+t\ (r\s (3-7 \' 
N (=) | 
CT) OF) 
As r takes on the values 0,1, 2,3 in turn, we find the relative frequencies 
(ignoring constants) 


08h Bl B23 0s 


The a posteriori relative frequencies, after the drawing of the m white balls, 
are of the same form, with s replaced by s+ m. If now s = 0 and? £ 0 
(Makeham’s first hypothesis), we obtain the sequence 


(ee ce a" 


which does not agree with Makeham’s solution. Further, if s # 0 andt £ 0, 
the sequence yields Makeham’s result only if s = ft. 


9.16 Henri Poincaré (1854-1912)® 


During the academic year 1893-1894 Poincaré gave a course of lectures on 
probability at the Sorbonne. His Calcul des probabilités of 1896 was based 
on these lectures®*, a second edition (which we shall consider here) ap- 
pearing in 1912. This book appears to contain all that Poincaré wrote of 
relevance to our present work. 

In Chapter IX, “Probabilités des causes”, Poincaré notes the arbitrari- 
ness inherent in the choice of a prior distribution in the following words: 


Quand on compare le nombre des cas possibles au nombre 
des cas favorables, on doit avoir soin que tous les cas soient 
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également probables. La convention qui repose sur des regardés 
comme également probables contiendra toujours un trés large 
degré d’arbitraire. [1912, p. 153] 


Section 95 is headed “Formule de Bayes”, though the actual formula 
given in the text is not denoted in this way. Poincaré’s full statement runs 
as follows: 


Soient n causes différentes qui peuvent étre mises en jeu C, Co, 
., Cy; la probabilité pour que la cause C;, si elle est mise en 
jeu, produise |’événement A est p;. 
51 nous savions que C’; est en jeu, nous pourrions affirmer que 
la probabilité de A est p,. Il faut supposer que deux causes ne 
peuvent étre mises en jeu simultanément. Avant l’événement, 
chacune de ces causes avait une probabilité a priort que je sup- 
pose donnée: la probabilité que la cause C; soit mise en jeu 
étant w;. 
L’événement A a eu lieu: quelle est la probabilité que ce soit la 
cause Ci qui lait produit? [§95] 
The answer to this question is given as 
Wi Pi 
wipi + wepe+---+WnDPn - 
and this is followed by some simple examples. Poincaré notes that several 
hypotheses may be made about the w;, supposed known a priori, and he 
considers in detail the cases in which (a) all w; are equal, and (b) uw, is 
proportional to a binomial coefficient. 

Poincaré next examines the case of a sequence of games of chess between 
two players A and B, A having won n and B m games, with n > m. In 
ignorance of the probability P that A will win the next game, one must 
suppose a priori that 


Prilp< P<p+ dp] = f(p) dp, 
where f(p) is unknown. Then “la probabilité a posteriori que p [our P] est 
compris entre p et p+ dp” [§101] is 


n+m4+1)! , es 
p(p)dp = EMID (1—p)”™ dp, 
where it is assumed that f(p) = 1. Indeed, 


On fait généralement V’hypothése f(p) = 1, faute d’autres ren- 
seignements. [§101]. 
In the next section Poincaré integrates p y(p) dp to obtain the probability 
of A’s winning, obtaining the answer 
n+l 
n+m+2- 
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This is followed by the remark 


Si j’avais appliqué le méme raisonnement a un jeu de hasard, 
je n’aurais pas eu le droit de supposer f(p) = 1. A priori, en 
effet, p devoit égaler 7 Donc f(p) devait étre infini pour p = 5 
[§102] 


This seems to suggest that, when Pr[P = 4] = 1, f(p) is a sort of impulse 


function®, i.e. 
f(1/2) = 00, f(a) =0 (2 #1/2) 


Lie 
I f(p)dp=1  foranye>0. 


2 


In Chapter X, “La théorie des erreurs et la moyenne arithmetique’” , 
Poincaré supposes that a sequence of measurements £1, %2,... , Zn are made 
of a certain quantity. To find the probability that the true value Z lies 
between z and z+ dz he supposes that 


Pr{x, < le résultat de l’observation < 2; + dri] = p(a1,z) dz . 
Using the prior probability 
Priz< Z< z+dz] = (2) dz, 


with “yw étant une fonction qui dépendra de ce que nous savons sur z” 
(§107], Poincaré deduces that the required (posterior) probability is 


J (z)p(21,2z)(£2,2)...p(tn, z) dz 


Following Gauss [1809], Poincaré now assumes that 
w(z)=k, (zi, z) = o(z— 24), 
where k is a constant, and chooses z by requiring that 


(1,2) p(@2, 2)... P(tn, 2) 


be maximized when z coincides with the arithmetic mean Z. This of course 

results in a Normal distribution as the posterior distribution. : 
This matter receives further attention in Chapter XII, “Erreurs sur la 

situation d’un point”, where the following problem may be found: 


Cherchons la probabilité pour que les coordonnées du point 
solent comprises entre x et r+dz, y et y+ dy. (§156] 
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Poincaré gives the solution as 


II~ dx dy 
f Iw de dy ’ 


where, in the notation introduced earlier in his discussion of Bayes’s for- 
mula, 


wi = U(r, y) dz dy 


Pi = P(Er1, m)v(E2, 92) --- (En, Mn) d&r dy... d&n dn 


and 


iS y(E1, m) pls, nN2) En Nn) 


Further application®* follows in Chapter XIV, “Calcul de l’erreur a crain- 
dre”, where Poincaré states “Admettons la loi de Gauss”. In the notation 
introduced before, he supposes that 


y(“; —z) = iB expl-n(a, —2z)*] 


Prlz< Z<ztdz,h<H <ht+dhj|=w(z,h)dzdh. 


and 


The posterior probabilities are then 


® dzdhdx,---dzy 
Pier e ee ee. 


dz,---dxy f{ { wd dzdh 
0 —co 
where 
® = v(x; — z)p(t2 — 2)... ~(4n — 2), 


and 


— palr-D/ exp(—nha?) dh 
Prlh < H <h+dh]= f bh-D?? exp(—nha?) dh. ’ 


with a? = S*(x; — £)?/n. 
i 
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9.17 Hugh MacColl (1837-1909) 


In the sixth of a series of papers under the general title On the calculus 
of equivalent statements, MacColl® discussed some questions in inverse 
probability®®. The first of these is the following: 

Problem 4. — Two intersecting circles A and B of areas a and 

b respectively, and with an area c common to both, are enclosed 

in a third circle E of area unity. 

Let a point P be taken at random in F. If P happens to fall 

in A, let a second point @ be taken at random in A; but, if P 

does not happen to fall in A, let Q be taken at random in E. 

Assuming (1) that Q falls in B, what is the chance that P had 

fallen in A? And assuming (2) that Q does not fall in B, what 

is the chance that P had fallen in A? [1897, p. 565] 


“This,” MacColl then writes, “is a question in inverse probability” [p. 566]. 
Assuming a formula 


which in the following problem will be proved true for any state- 
ments a and @, whether or not these statements have reference 
to causes and consequences, [p. 566] 


(a formula recognizable as a discrete Bayes’s rule), he deduces that 


Pr[P € AlQ € BI 


Pr[P € A] Pr[Q € BIP € Al 
Pr{(P € A] Pr[(Q € B|P € A] + Pr[P ¢ A] Px[Q € BIP € Al 


= c/(cb— ab) 
and similarly that 
Pri(P € AlQ ¢ B] = (a—c)/[(a—c) + (1 —a)(1—8)]. 
MacColl notes finally that, when the events [P € A] and [Q € B] are 
independent, then 
PriP € A\Q € B])=Pr[Pe€ A|Q ¢ B] = Pr[AJ=a. 


The fifth problem is devoted to a proof of the above-mentioned formula, 
the formula for inverse probability 


CC (C, V Cr V 
vi(Sa)/LSe 
(in MacColl’s notation®’) being deduced from this. 


We have already mentioned the question of time-order in connexion with 
Bayes’s Theorem®®. Having given the fundamental formula 


Pr[ABCD...] = Pr[A].Pr[B|A]. Pr[C|AA B].Pr[DJAABAC]..., 
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FIGURE 9.1. MacColl’s sketch for random choice of points. 


MacColl notes that 


it is not necessary to assume that the event asserted by the 
statement A precedes in the order of time the event asserted 
by the statement B; that the event asserted by B takes place 
before that asserted by C; and so on. In whatever time order 
the events may occur, and whether or not they are mutually 
independent, the formula always holds good; and it will still 
hold good if we interchange any two of the letters. 

[1897, p. 567] 


The next problem is strictly speaking not one in inverse probability; 
however it does purport to find a prior probability: 


Problem 6. — Out of a very large (say, infinite) collection of 
problems in probability, with the correct answers to the required 
chances ranging in arithmetical progression between 0 and 1, a 
problem is taken at random. What is the a priort probability, 
before the problem is known, that the event whose chance is 
required in it will, upon trial, happen m times out of n? 


[1897, p. 568] 


(The requirement that the chances are arranged in an arithmetic progres- 
sion is later stated to be unnecessary: all that is wanted is for “the correct 
answers to be distributed irregularly and at random, but on an average 


evenly, between 0 and 1” (loc. cit.).) 
Denoting by P, the assertion®® that the correct answer to the randomly 


chosen problem is x and letting 


V predict that the event whose chance is required in the random 
problem will, upon trial, happen exactly m times out of n, 
(1897, p. 568] 
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MacColl deduces that?® V = (Paz + Pode +::-+Pi)V, and hence 
Pr[V] = Pr[Pae| Pr[V|Pac| + +--+ Pr[ Pi] Pr[V|P,] 


= dx(Pr[V|Pae] + ---+ Pr[V|P,]) 


Z [ Pr[V|P,] dz 


Zs [ fe x™(1—x)"-™ dx 


1 
ee ta 
independent of m. 
Following on from this result we find the following: 


Problem 7. — A mathematician solved a question in probability 
and found the required chance to be c. To test this result he 
had recourse to experiment and found that the event in question 
happened m times out of n trials. What is the chance of c being 
the correct answer, assuming (1) that the a prior: chance of 
his being right, independently of the experiment, is a (that is, 
out of n problems he correctly solves na on an average); and 
(2) that the problem was taken at random out of a very large 
(say, infinite) number of problems of which the required chances 
ranged at random between 0 and 1 — high, low, and medium 
values between those limits being all equally probable? 

[1897, pp. 568-569] 


Letting P, assert that the correct chance is c, and V assert that the event 
of interest occurs exactly m times in n trials, MacColl sets 


Pre Sa, PrP) = la 


M = Pr|V|P,] = a c™(1—¢)?-™ | 


Then, presumably in the same way as in the preceding problem, 
: 1 
Pr[V|P!] = / Pave 
0 +1 
By Bayes’s rule it follows that 
aM 
aM +(1—a)/(n+1)_ 


The next problem seems to be connected with the rule of succession: 


A= Pr[P.|V] = 
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Problem 8. — From the same data as in Problem 7, with the 
experiment V added as an a priori, what is the chance that the 
event, whose probability the mathematician had concluded to 
be c, will happen on the (n + 1)'" trial? [1897, p. 571] 


With P., P; and V as defined before, and with @ asserting that the event 
that has already happened m times in n trials will happen again on the 
(n+ 1)th trial, we have 

Q= P.QV P.Q 


and hence 
Pr{Q] = Pr[P.] Pr[Q|P.] + PrLP!] Pr(Q| Pi]. 


Since V is now supposed to have occurred, Pr[P.] = Pr[P,|V]: further, 


1 
Pr(Q|P!] =) pdr se, 
0 


and hence 


9.18 Karl Pearson (1857-1936) 


In a working life so richly productive of statistical innovation as that of 
Karl Pearson’!, particularly in the biometrical field, one might well expect 
to find little time devoted to matters of historical or philosophical concern. 
Pearson’s interest in statistics (and science) in general, however, was such 
as to lead him to not inconsiderable speculation on these matters??, and 
among his voluminous writings®® eight have been singled out as bearing on 
the present investigations. 

The first of Pearson’s works that is pertinent is his justly celebrated 
The Grammar of Science (first published?* in 1892), a work that Haldane 
[1957] regards as Pearson’s “main contribution to philosophy”. In Chapter 
4, entitled “Cause and Effect. Probability”, we find in Section 13, headed 
“Probable and Provable”, a discussion of the rule of succession phrased in 
the following words: 


A certain order of perceptions has been experienced in the past, 
what is the probability that the perceptions will repeat them- 
selves in the same order in the future? [p. 168] 


Pearson’s belief in the frequency interpretation of probability is born out 
by his further statement 
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The probability is conditioned by two factors, namely: (1) In 
most cases the order has previously been very often repeated, 
and (2) past experience shows us that sequences of perceptions 


are things which have hitherto repeated themselves without fail. 
[p. 168] 


He states further Laplace’s assertion that the probability of the further 
occurrence of an event that has already occurred p times and has not been 
known to fail, is (p+ 1)/(p+ 2), and illustrates this result by considering 
(a) the further solidification of hydrogen after one such success, and (b) the 
further rising of the sun after a million dawns®°. Believing that the num- 
bers obtained in these two cases “do not in the least represent the degrees 
of belief of the scientist regarding the repetition of the two phenomena” 
[p. 169], Pearson®® argues that the problem ought rather to be posed as 
follows: 


po different sequences of perception have been found to follow 
the same routine, however often repeated, and none have been 
found to fail, what is the probability that the (p+ 1)th sequence 
of perceptions will have a routine? Laplace’s theorem shows us 
that the odds are (p+ 1) to one in favour of the new sequence 
having a routine. [p. 169] 


In Section 14, “Probability as to Breaches in the Routine of Percep- 
tions”, Pearson points out that Laplace’s result permits one to take account 
of “possible ‘miracles’, anomies, or breaches of routine in the sequence of 
perceptions” [p. 170] (perhaps all of these are covered by the second term). 
He concludes that one is justified in saying that miracles have been proved 
incredible, where “proved” is interpreted as the establishment of an over- 
whelming probability in favour of. 

In Section 15, “The Bases of Laplace’s Theory lie in an Experience as 
to Ignorance”, Pearson turns his attention more closely to Laplace’s result, 
drawing an analogy between the world of perceptions (divided into routine- 
order and anomy) and a bag containing white and black balls. Writing 
further of a coin-tossing set-up, Pearson mentions the following Laplacean 
principle: 


“If a result might flow from any one of a certain number of differ- 
ent constitutions, all equally probable before experience, then 
the several probabilities of each constitution after experience 
being the real constitution, are proportional to the probabili- 
ties that the result would flow from each of these constitutions.” 
[pp. 173-174] 


and in expanding further on its use he emphasizes the role played by ex- 
perience in the determination of a priori probabilities. 
In Section 16, “Nature of Laplace’s Investigation”, Pearson returns to 
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his “nature bag” example, supposing no longer that routine and breach of 
routine are equally probable, but rather that every possible ratio of black 
to white balls is equally likely’. He then deduces an expression of the form 


Pr [white] = 5> Pr [white | constitution i] Pr [constitution i] , 
2 
and points out that Laplace’s result follows. A particular case is discussed 
in the following section, “The Permanency of Routine for the Future”. 

In some measure The Grammar of Science is still pertinent to modern 
science’®, but one must agree with Haldane [1957] that “the discussion of 
probability and statistical method in the first edition of The Grammar of 
Science 1s superficial” . 

We now come to Pearson’s papers, the first relevant one of which was 
written with Filon and published in 1898 (read on the 25th of November 
1897). This paper, entitled “On the Probable Errors of Frequency Con- 
stants and on the Influence of Random Selection on Variation and Correla- 
tion” formed the fourth part of “Mathematical Contributions to the Theory 
of Evolution.” Commenting on this paper, E.S. Pearson [1967] writes 


The basis of the approach used here is a little obscure and 
there seems to be implicit in it the classical concept of inverse 
probability. [p. 347] 


A similar comment’? has been expressed by MacKenzie [1981, p. 241]. 

The main result of this paper (to be found in the second article) has 
been reformulated by MacKenzie [1981, pp. 241-243] and Welch [1958] in 
terms of inverse probability: we shall present a similar (but more general) 
- interpretation®®. Pearson and Filon show that if one considers the fre- 
quency surface z = f(21,...,2n;1,%2,---), where the 7; are frequency 
constants (i.e. means, standard deviations, &c.), then, on neglecting cubic 
and higher terms in the deviations An,;, “the frequency surface giving the 
distribution of the variations in the deviations” [p. 236] is 


Pr = Po exp{—4 | oar (An,)° — 20 arsAn-Ans| } ; 


where Po is a normalizing constant and 


a = -[--f f [d?(log f) /dn?,] dey...datp 


rs Sees f [d?(log f) /dn, dyn,|dx,...dzy . 


The desiderata are 


o,, the standard deviation of An,, and R,;, the coefficient of 
correlation between Ay, and An, [p. 236}, 
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the finding of which requires consideration of the (posterior) distribution 
of the An; and hence specification of a prior. 

As a specific illustration Pearson and Filon consider a random sam- 
ple {(X;, Y;)} of size n drawn from the bivariate!°! Normal distribution 
N (fe; Hy, 02,04,p). The joint density (of the data as a function of the 
parameters) is then viewed as a density of the parameters in order to de- 
termine things like the standard deviations of errors in o,, oy and p. If 
we denote the joint density by f(S | uz, Uy, Oz, Ty, e), where S denotes the 
data, then 


F(S, be, by, Tr, Fy, p) = f(s | Ur, fy, Or, Fy, P) f(b, Hy, Tx, Ty, P) 


where f is used indiscriminately to denote a density function. The pos- 
terior distribution of the parameters given the data is then found in the 
usual manner. The choice of a uniform prior distribution for the parameters 
yields a posterior distribution that is proportional to the likelihood, and 
it is this latter function with which Pearson and Filon are concerned. The 
_ standard deviations of the errors in the parameters given in this paper are 

today well known. : | 

The next paper demanding our attention is entitled “On the influence of 
past experience on future expectation”: it was published in the Phizlosoph- 
tical Magazine in 1907 with the avowed aim of putting 


into a new form the mathematical process of applying the prin- 
ciple of the stability of statistical ratios, and to determine, on 
the basis of the generally accepted hypothesis, what is the ex- 
tent of the influence which may be reasonably drawn from past 
experience. [p. 365] 


After pointing out inadequacies in common application of the principle, 
Pearson states!°? 


I start as most mathematical writers have done, with “the equal 
distribution of ignorance,” or I assume the truth of Bayes’ The- 
orem. [p. 366] 


If Pearson is equating “the equal distribution of ignorance” with Bayes’s 
Theorem then he is simply wrong. He goes on further to say “I hold this 
theorem not as rigidly demonstrated” [p. 366], and again he errs: granted 
the assumptions made by Bayes, the tieorem is correct. 

Pearson now passes on to the statement of Bayes’s Theorem, which he 
gives as follows 


Pria < X < x+6z | p occurrences of F and q failures of EF] 


= (1 a)tde | [ 2-2) dn 
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“on the equal distribution of our ignorance” [p. 366]. The chance that in 
a further m trials the given event E will occur r times and fail s = m—r 
times 1s then 


nA. f. 1 
C= ( ) | gPtr(] — git? ic | | ela) da. 
r/ Jo 0 


and “This is, with a slight correction, Laplace’s extension of Bayes’ The- 
orem” [p. 367]. We have already commented on the correctness of this 
assertion. 

Noting that the usual method of evaluation involves, via beta-functions 
and the Stirling-de Moivre approximation, the expression of C, in terms of 
ordinates of the Normal distribution (an approach that later illustrations 
in the paper show to be sometimes unsatisfactory), Pearson proposes to 
use the hypergeometric series 


m(p+1) .m(m-1) (tlt?) 
Co) 1+ —— + — sow. 
{'+ Term t 2! (q+ m)(q+m-— 1) 
OID es wi 
3! (q+ m)(q+m—1)(¢+m—2) 
whose successive terms give C,, r € {0,1,...,m}, with 


Co =T(¢t+m4+1)P(n4 2)/T(qt 1) (n+ m4 2). 


Note that the term in braces in this series is the hypergeometric function 
afi(p +1,-m;+(q-+ m); 1). 

A detailed comparison of the moments of the hypergeometric series with 
those of the standard Normal distribution leads to the following conclu- 
sions: 


it is not possible in judging expectancy from past experience 
(.) to neglect the relative sizes of the first and second samples, 
or (i1.) to neglect, even in characteristics which appear in 10 
p.c. of the sample, the sensible deviation from the Gaussian 
distribution. [p. 373] 


A further conclusion drawn is the following: 


The frequency of future samples is given by a certain hypergeo- 
metrical series, which is not at all closely represented by the 
Gaussian curve except when the past experience is very large 
as compared with the proposed sample, and further the char- 
acteristic expected does not occur in either a very large or very 
small percentage of the population. |p. 378] 
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Some thirteen years later, in 1920, we find Pearson returning to this 
question in his paper “The fundamental problem of practical statistics”, a 
paper that in a sense is an amplification of that just discussed. 

The question, stated to be “as ancient as Bayes” [p. 1], explored in this 
paper runs as follows: 


An “event” has occurred p times out of p+q = 7 trials, where 
we have no a priori knowledge of the frequency of the event in 
the total population of occurrences. What is the probability of 
its occurring r times in a further r-+s6 = ™m trials? [p. 1] 


Pearson briefly discusses the contributions made by Bayes, Price, Con- 
dorcet and Laplace to the solution of this problem, and before adding his 
own solution he comments on criticism by Boole and Venn of inverse prob- 
ability, and notes also that 


Edgeworth returns to the appeal to experience from which Bayes 
and Laplace ought to have started. [p. 4] 


Pearson finds that those antagonistic to inverse probability generally 
attack two hypotheses used by Bayes, viz. 


(i) the hypothesis that a przort we ought to distribute our 
ignorance of the chance of a marked individual occurring 
equally, 

(ii) the hypothesis that earlier occurrences do not modify the 
chance of later trials [p. 2], 


and in an attempt to divert the assault on the first hypothesis (the one that 
is usually attacked), he proposes to investigate whether any continuous 
distribution of a priort chances would lead to the same result. 

To this end let a stroke be made at random on a line of length a, the 
position of this stroke at distance x from one end being known. A further n 
strokes are now made at random on the line, p falling in the segment 0 to z 
and g = n—pinz toa. Unlike Bayes, Pearson now supposes the probability 
density function for the strokes to be given by y = y(«)/a, where y is any 
continuous function. Denoting by X the chance of a stroke, we have 


Prix << X <x2+6z] = p(r)éz/a. 


Thus P,, the chance of a stroke afterwards occurring between 0 and z, is 
given by 


a / p(x) dz/a , 
0 
and similarly 


Q2=1-Pe= | at dele: 
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while Pp = 0 and P, = 1. The probability of the combined event will be 


5Pz. (Pr)? (Qz)" (? : " | 


and hence the probability that the unknown probability lies between P, 
and P, (i.e. X lies between b and c) will be! 


[. (Pe) (1 Pe)! ar. | (Pr)? (1 — Pr)? dPe . 


Similarly the chance that m = r+ s trials will yield r successes and s 
failures will be 


(" : "| [ (POP a Pe dP /[ (P,)? (1 — Pz)! dPz . 


r 


This latter expression, like that given by Laplace, reduces to!4 


Cp = B(pt+r+i1,qt+s4+1)/B(iptl,qt+tDB(ir4+1,s4+1). 


Two methods are now proposed for the simplification of C,.. The first of 
these, a somewhat more complete development than that given by Laplace, 
requires the replacement of the beta-functions by gamma-functions and the 
latters’ approximation by the Stirling-de Moivre formula for large values 
of p,q,r and s. The final result is 


C, = Cexp(—h?T, /207) exp(—hT2/ 20 + h°T3 /60°) , (25) 
where 

Ty = 1—[(L+ 2p)/2m(1 + pl - 2PQ)/PQ| 

Tr = (Q-P) /\/m(l+)PQ 

Ts = (1+2p)(Q-P) / m+ p)PQ 

C = e~pPgiT(m+2)/B(pt+1,¢4+1)(n+m+4 2) 


l+p=(n+m)/n ; P=p/n , Q=q/n ; 
r=(mp/n)+h , s=(mq/n)—h . 


Equation (25) clearly shows that unless 1/,\/m be small, the terms in A 
and h? cannot be neglected: in other words, a skew frequency curve is 
suggested, rather than a Normal one. After some further discussion of the 
effect of the magnitude of 1/,/m, Pearson notes that the Gauss-Laplace 
distribution fails 
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(a) for small samples. Its whole method of deduction is then 
wrong for Stirling’s Theorem is invalid; 

(b) when the sample is large, but the probability of occurrence 
is small, so that mP is finite and small. [p. 8] 


As asecond method Pearson proposes to find a less rough approximation 
to the original hypergeometrical (sic) series for C.. Just as the Normal den- 
sity had been shown by Laplace to correspond to the symmetrical binomial 
histogram, so Pearson finds in the present case (after considerable manip- 
ulation and starting from the assumption that Co,C\,...,C,, Cr4i,... 
are plotted as a histogram of rectangles of base c and heights C'p/c, C)/c, 
...,C,/ce, Cp4i/c,...) that the curve corresponding as closely to the skew 
binomial histogram satisfies the differential equation 


| a f(ob team 


where o7 = PQ(m+ 1)c*. Assuming rather more generally that 


—x / (ao +a,;r+ ay”) 


where ay > 0, ag < 0 and 


ag = PQ(m4+1)\(1+(m+41)/n)e? 
a = Q-P)EH (m+n) o 
ag = —1/n ) 
Pearson writes ld 
; = = —x [bo (by — z)(b2 +z) ’ 


and hence obtains 
y = yo(1+2/b2)? (1 — x /b,)°* 


where s; = 6; /bo (bi +62) and sy = bo /bo (b1 + bz), and yo is the modal 
ordinate. 

This result is then applied to the following problem: in a sample of size 
1,000, 20% of the individuals are found to possess a certain characteristic. 
What is the chance that such a percentage occurs in a further sample of 
size 100? This problem, of a type Pearson terms “the fundamental problem 
of statistics” [p. 12], is explored by both the above methods, the skew curve 
giving a much better fit to the series than does the Normal curve. A further 
problem, in which an indefinitely large population contains 10% of a given 
character, is considered, similar conclusions once again obtaining. 

Following closely upon this paper (in the same volume of Biometrika, 
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in fact) came Pearson’s “Note on the ‘Fundamental Problem of Practical 
Statistics.’” The previous paper had apparently occasioned some misun- 
derstanding: 


I believe it to be due to the critics not having read Bayes’ orig- 
inal theorem as given by Price in the Phil. Trans., Vol. LIII. 
[p. 300] 


Pearson repeats here Bayes’s argument: a ball is placed at random on a 
table (of unit breadth, say), its distance from one side being z (a variate) 
and its chance of falling between x and « + é6z being 6x. With Bayes’s 
definition of “success” and “failure” it follows that the chance of p successes 


and gq failures will be 
(+ "ar —xz)% dz. 
P 


Pearson now sagely notes that 


It is solely the fact that all possible values of the variate x are 
made a priori equally likely that makes the chance of a success 
x, equal to the variate itself. [p. 301] 


He now repeats his argument concerning P, of the earlier paper, showing 
once again that the same conclusion is reached in this case as that in which 
the “equal distribution of ignorance” is assumed. The final paragraph is 
worth noting: 


I believe that in most cases such a variate [as xz] may be hy- 
pothecated and if it can the objection to Bayes that he made 
all positions of his balls on the table “equally hkely” can be 
removed, and if removed one fundamental objection to his the- 
orem as he stated it, i.e. in terms of excess or defect of a variate, 
disappears. [p. 301] 


In 1924 in his “Note on Bayes’ Theorem”, Pearson becomes mote per- 
sonal: instead of referring vaguely to “critics” he begins the present paper 
with a sharp attack: 


Dr. Burnside, I venture to think, does not realise either the 
method in which I approach Bayes’ Theorem, or the method in 
which Bayes actually approached it himself. [p. 190] 


(Burnside’s note, which immediately preceded this paper by Pearson, 1s 
discussed in an appendix to this chapter.) Pearson once more repeats his 
argument: suppose that an occurrence takes place if a certain variate X, 
known to lie between two values 0 and a (say), exceeds a certain value €, 
and suppose that the occurrence does not occur if X does not exceed €. 
The value € being unknown, let us suppose that the frequency curve of 
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the a priort possible values of € is y = y(€) d&. (We assume that ¢y(-) is a 
probability density function over [0, a].) Suppose further that the frequency 
curve of X in the population (of size N) is Nf(x). Then the chance of an 
occurrence is 


E 
Pee / f(a) dz, 


and hence 
P _ {PT4) pp q 
r[p occurrences & g non-occurrences | €] = F Pe (lee) ple) de 


Supposing next (as did Bayes) that Pr[€ = &)] « Prfevent | € = 9], we 
find that “the probability of the constitution being €” [p. 190] is 


PE (1— Pe) oleae / J Pe (= Pe! ve) 
Hence the chance of an (r,s) sample following a (p,q) sample is 


r+ 7 fo ieee (1 =X Dee y(€) dé 
r fo Pe A — Pe)? p(E) dé 


Pearson next points out that this result generalizes that of Bayes in two 
respects: 


Cip,a)(r.8) = ( (26) 


(1) Bayes assumes y(€) = l1/a , 1e. all values of € are a priori equally 
likely; 


(ii) Bayes takes f(x) = 1/a also. 


It was indeed to overcome Bayes’s assumption that all values of X and all 
values of € are equally likely, says Pearson, 


that I wrote my paper of which Dr. Burnside, who does not 
seem to have read Bayes, disapproves. jp. ‘\91] 


Since dP; = f(&) dé, expression (26) can be rewritten in the form 


Tr 


oa So PER (1 = Pe oO (FQ) dP 
fo PP (1 ~ Pe)? (6) /F(G) dPe 


If we take y(€) = f(€) and let Pe = z, we obtain Bayes’s Theorem — or 
so Pearson [1924b, p. 191] asserts, though as we have already noted he errs 
in this conclusion. Commenting further on the choice of y and f, Pearson 
notes that 


Cip,a)(r,s) = ( 
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If Bayes’ Theorem does not give us reasonable results, then we 
must select a better value of the ratio y(€)/f(€) than unity, but 
at present it has not been demonstrated to lead to results con- 
trary to experience; it has been solely criticised on the ground 
that equal distribution of ignorance is not logical. [p. 191] 


Noting the difference between (26) and the formula cited by Burnside, 
Pearson says 


Dr. Burnside cites as Bayes’ formula, what is only an element 
in Bayes’ Theorem, and he does so on the strength of Poincaré, 
who in all probability had not studied Bayes’ original work. 
[p. 191] 


This is followed by fairly extensive discussion of the applicability of the 
“equal distribution of ignorance” assumption, and Pearson stresses that 


it cannot be too generally recognised that it is the basis of 
Bayes’ Theorem to assume no knowledge beyond the (p,q) ob- 
servation. [p. 192] 


Pearson also adduces reasons for his preferring the use of y and f to 
Bayes’s assumptions, but notes that, in the preceding notation, the proba- 
bility distribution function F’ of Pg satisfies 


F(Pe) = 9(8)/F(6) . 


It thus follows that if y(€)/f(€) is constant, then Pe has a uniform distri- 
bution, “or as in Bayes’ Theorem all chances are equally likely” [p. 192]. 

The four papers by Pearson discussed so far form a quartet on which 
some comment may well be made. We have already mentioned Burnside’s 
criticism and Pearson’s rebuttal thereof, and shall say no more on this 
point. Writing in 1921, F.Y. Edgeworth comments as follows on Pearson’s 
“The fundamental problem of practical statistics” : 


Apparently Professor Pearson does not withdraw the counte- 
nance which in an earlier writing [The Grammar of Science, 
3rd edition, ch.iv, p. 146] he had given to the doctrine upheld 
by the present writer (Mind, 1884), that the equal distribution 
of a priori probability (in the absence of specific knowledge) 
rests on a rough but solid basis of experience. Professor Pear- 
son now seems to regard the doctrine, not indeed as untrue, but 
as unnecessary for the purposes of Inverse Probability. [p. 82] 


Commenting further on Pearson’s question “Is it not possible that any 
continuous distribution of a priori chances would lead equally well to the 
Bayes-Laplace result?” (op. cit., p. 4), Edgeworth notes that one may in- 
deed answer this in the affirmative without rejecting his own remark on 
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the equal distribution of a priort probability. 
Further, although Pearson’s question zs generally answered in the affir- 
mative, it is not for the reason advocated by its proposer: 


His reasoning seems to rest upon a very peculiar — not to say, 
hardly supposable — relation between the antecedent proba- 
bility that a certain “possibility” (in Laplace’s phrase) or con- 
stitution (e.g. of a coin or die) would have existed, and the a 
posteriori probability that, if it existed, such and such events 
(e.g. so many Heads or Aces in n trials) would be observed. 
(1921, p. 83] 


Assuming with Pearson that y(x)éz/a is the a priori distribution of the 
chances, one notes that y should not appear in the a posteriori probability. 
The fact that the usual Bayes-Laplace result is obtained under almost any 
continuous initial distribution was in essence noted by Cournot [1843] and 
Mill [1843]. This idea of the “swamping” of prior knowledge by experience 
is of course well known to modern Bayesians. 

In a remarkably controlled passage R.A. Fisher (1890-1962) remarks 
that “The fundamental problem of practical statistics” is a paper 


in which one of the most eminent of modern statisticians pre- 
sents what purports to be a general proof of Bayes’ postulate. 
(1922, p. 311] 


This is of course a totally inaccurate observation: no attempt at a “proof” 
of the postulate was essayed. 

More recently A.W.F. Edwards has commented in two papers on Pear- 
son’s early work involving Bayes’s Theorem. In the first of these papers, 
entitled “A problem in the doctrine of chances”, Edwards takes exception 
to the way in which Pearson framed his question, since 


to speak of ‘the probability of its occurring r times’ is to beg 
part of the question, for probability may not be the proper 
calculus for prediction. [1974, p. 44] 


After noting Pearson’s expression 
Cr, = B(p+r4+1,q+s41)/Bp+1,q+1B(ir+1,s41), 


a result that is independent of any non-uniform prior, Edwards writes 


Pearson’s capacity for not seeing the wood for the trees was 
exceptional. Instead of commenting on this remarkable inde- 
pendence, he thought he had solved the fundamental problem, 
and busied himself in the rest of the paper with evaluating beta- 
integrals. [1974, p. 46]!°° 


516 9 Laurent to Pearson 


Edwards notes further in this paper that Edgeworth [1921] and Burnside 
[1924] showed the unsuitability of the Bayes model inasmuch as it implies 
a relation between the prior and the likelihood, and “Pearson ... took the 
point eventually” [1974, p. 46]. 

In the second paper, “Commentary on the arguments of Thomas Bayes” , 
Edwards essentially repeats his earlier arguments, though in a more concise 
form. He makes the additional point that 


Pearson had made the mistake of identifying the distribution of 
the throws of the ball with the prior distribution of the proba- 
bility. [1978, p. 118] 


In 1979 D. Hinkley took a fresh look at Pearson’s “fundamental prob- 
lem of practical statistics”, providing a definition of predictive hkelihood 
“which can produce a simple prediction analog of the Bayesian parametric 
result, posterior « prior x likelihood” [p. 718]. Hinkley errs however in as- 
serting that “Pearson’s purpose was to reexamine the general applicability 
of Bayes’s earlier solution” (loc. cit.): at least, while that might have been 
Pearson’s aim, Bayes’s result, as we have already seen, is not concerned 
with future events (that extension is due to Price). 

Now let us return to Pearson’s work. In his paper “James Bernoulli’s the- 
orem”, published in Biometrika in 1925, Pearson, in between his discussion 
of Bernoulli’s proof and his own treatment of the problem, remarks 


Bernoulli then turns the problem round and says that if the 
observed value in nt trials be p, then the true value po will 
lie between p+ 1/t with the given probability. This is rather 
stated than proved, but it is of course the kernel of much later 
developments of importance. Leibnitz raised objections to it. 
[p. 205] 


No further comments on this point are however made: we have already said 
something on this score elsewhere!"°. 

Pearson next turns his attention to a critical examination of a commonly 
used sampling method in his 1928 paper entitled “On a method of ascer- 
taining limits to the actual number of marked members in a population of 
given size from a sample” published in Biometrika. The problem considered 
is the following: a population of size N contains p marked and gq = N — p 
unmarked individuals, a sample of size n from this population being found 
to contain r marked and s = n — r unmarked members. It is usual to 
estimate the percentage of marked individuals as 


100r/n + 67.449 /rs/n8 , 


the probable error!®°” thus found being taken as a rough measure of the 
possible deviation of the sample value 100r/n from the actual (though 
unknown) value 100p/N. Pearson finds the reasoning leading to this result 
unsatisfactory, on the following grounds: 
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first, because the result is independent of the size of the popu- 
lation sampled, and secondly because it really throws us back 
on the normal curve as representing the binomial, and this will 
only be correct if r or s be not small as compared with n. 


[p. 149] 
Pearson sees in this general question two distinct problems: 


(i) on the basis of a sample of size n = r+s, what will be the distribution 
of r’ and s’ in a further sample of size n’ = r' + 8’? 


(ii) on the basis of a sample of size n = r+ 8, what is the distribution of 
p and q? 


(Pearson also refers to the quaesitum in this second question as “the like- 
lihood of various values of p and gq in the actual population N” [1928, 
p. 149].) The first of these problems Pearson views as involving an appeal 
to Bayes’s Theorem, and it was discussed in his paper of 1907. It is hardly 
necessary once again to stress that it was Price who extended Bayes’s result 
to the question of future observations: thus Pearson is slightly inaccurate 
in his present observation. 

Proceeding to the second question, we find (sampling occurring without 
replacement) that 


cazmtoie (8) /() 


Hence, by the theory of inverse probability — a theory that Pearson does 
not associate with Bayes, apparently — we have 


Cyr =Prl[p |r| x pl(N—p)!/(p—r)!(N-p—ne+r)!. 


Pearson determines the constant of proportionality by summing an appro- 
priate hypergeometric series (a method advantageously used in his paper 
of 1907): recourse to the definition of conditional probability results more 
swiftly in the solution 

n+1 
Ne 


the prior probability of the population’s containing p marked items and the 
probability of a sample of size n containing r marked individuals being 
(N + 1)7! and (n+1)~? respectively. 

Consideration is next given to the finding of various moments of the 
hypergeometric series; and in view of the labour that might be incurred in 
computing the successive terms of such a series, Pearson suggests that the 
series be replaced by its appropriate frequency curve, found to be 


Cp,r = Crp ) 


y = yo(rb/n+2)"(sb/n—z)* , 
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where yo = M(n +1)! /b"+! r!s!, and M is “the total number of possible 
populations from which the sample may have been drawn” [p. 157]. Four 
different determinations of the curve range b are suggested, and Pearson 
plumps eventually for b= ,/(N +2)(N — n). Several examples follow. 

The paper concludes with two appendices: in the first of these Venn’s 
criticism of inverse probability is examined, while in the second it is a 
solution by Laplace that falls under the microscope. These appendices have 
been discussed in the present work in the appropriate chapters. 

Despite his support for inverse probability in general, Pearson seems to 
have made little use of it in his more statistical work: indeed, Jeffreys notes 
that? 


An enigmatic position in the history of the theory of probabil- 
ity is occupied by Karl Pearson ... The anomalous feature of 
his work is that though he always maintained the principle of 
inverse probability, and made this important advance, he sel- 
dom used it in actual applications, and usually presented his 
results in a form that appears to identify a probability with a 
frequency!°’. [1961, p. 383] 


Before we leave Pearson we might note that he paid some attention 
to Bayes’s Theorem and its applications in his lectures (see E.S. Pearson 
[1938] for further details). The interested reader might also consult Pearson 
[1978]. 


919 Miscellaneous 


There are a few works that, although they fall in this period, have not been 
discussed here as I have been unable to examine them in detail. They are 
Fujisawa [1891] (in which the rule of succession as generalized to (r + s) 
future events is discussed), Gosiewski [1886] (the inversion of Bernoulli’s 
Theorem), and Nekrassoff [c.1890] (the inversion of Bernoulli’s Theorem). 

There are also pertinent, though slight, passages in Hagen [1837] and 
Sorel [1887]. In the first of these works we find a discussion of the rule 
of succession and of the sun’s rising, while in the second it is noted that 
Bernoulli’s Theorem is incomplete without an inverse. Neither contribution, 
however, is of sufficient depth to warrant detailed discussion in the present 
work. 


9.21 Appendix 9.2 519 


9.20 Appendix 9.1 


In 1924 William Burnside (1852-1927) published a note “On Bayes’ for- 
mula” in Biometrika. Here, following Poincaré, he stated this formula as 


Pr[A; | B] = Pr [Ai] Pr[B| Ad] /SPr [Aj] Pr[B | Aj), v6 {1,210.58}: 


Burnside claimed that the argument given by Pearson [1920a, p. 5] was 
unsatisfactory in that the numerical value of Pr[B | A;] depended not only 
on the nature of A; but also on Pr[A;]. However, an examination of the 
problem initially posed by Pearson had persuaded him that the value of 
Pr[A;] had no relation to nor effect on the value of Pr[B | A;,], and he 
concluded 


There is therefore no reason for supposing that any conclusions 
drawn from the investigation on p. 5 will hold with respect 
to the statistical problem stated at the beginning of Professor 
Pearson’s paper. [p. 189] 


9.21 Appendix 9.2 


Original text of extracts given in translation in §9.6. 


1. A blindfolded person ... black and white marbles. Af en Pose er der 
udtrukken Kugler iblinde; det har vist sig, at et vist Antal af disse 
vare hvide, et andet Antal sorte, der spgrges om Sandsynligheden for, 
at Posen har et bestemt Indhold, f. Ex. lige mange hvide og sorte; 
man ved, at Kuglerne ere enten sorte eller hvide. 


2. [This] above-mentioned disparity... Bayes’s theorem. Den Strid, som 
1 det foregaaende omtales, hidrgrer udelukkende fra den forskjellige 
Anvendelse af Bayes’ Regel. 


3. Most people will... assumed to be worth 10 Ore. Men det vil vistnok 
af de fleste betragtes som absurd, at Kjgberen skal betale mere for 
sine Varer, fordi han har sorteret Prgverne, uagtet de enkelte Stykker, 

_ fgr og efter Sorteringen, antages at vaere 10 @re vaerd. 


4. A contradiction immediately ... subdivisions of time. Men alligevel 
kommer der strax Strid, saasnart man tager Hensyn til, at der kan 
anvendes forskjellige Inddelinger af Tiden. 


0. Bayes’s theorem ts ... as to necessary causes. Bayes’ Regel i alle 
saadanne Tilfaelde, hvor man intet véd a priori om de sggte Aarsager. 


Notes 


Neither indeed would I have put my selfe 
to the labour of writing any Notes at all, if 
the booke could as well have wanted them, 
as I could easilie have found as well, or 
better to my minde, how to bestow my 
time. 


Marcus Aurelius Antoninus. 


Chapter 1 
1. W.S. Gilbert, The Mikado, 1885. 


2. Commentators are divided on whether Hazlitt should be regarded 
as an essayist or a critic; thus Priestley [1960, p. 6] prefers to de- 
scribe him as an essayist, while Brett [1977, p. 5] regards him as a 
biographical critic. 

3. The first abstract presentation of this idea may be found in Plato’s 
Republic, the parable reading as follows in the Analysis to Book VII, 
Steph. 514-517 of Jowett’s translation: 


Imagine human beings living in an underground den which 
is open towards the light; they have been there from child- 
hood, having their necks and legs chained, and can only 
see into the den. At a distance there is a fire, and between 
the fire and the prisoners a raised way, and a low wall is 
built along the way, like the screen over which marionette 
players show their puppets. Behind the wall appear mov- 
ing figures, who hold in their hands various works of art, 
and among them images of men and animals, wood and 
stone, and some of the passers-by are talking and others 
silent. ... they see only the shadows of the images which 
the fire throws on the wall of the den; to these they give 
names, and if we add an echo which returns from the wall, 
the voices of the passengers will seem to proceed from the 
shadows. Suppose now that you suddenly turn them round 
and make them look with pain and grief to themselves at 
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the real images; will they believe them to be real? Will not 
their eyes be dazzled, and will they not try to get away from 
the light to something which they are able to behold with- 
out blinking? And suppose further, that they are dragged 
up a steep and rugged ascent into the presence of the sun 
himself, will not their sight be darkened with the excess 
of light? Some time will pass before they get the habit of 
perceiving at all; and at first they will be able to perceive 
only shadows and reflections in the water; then they will 
recognize the moon and the stars, and will at length be- 
hold the sun in his own proper place as he is. ... How will 
they rejoice in passing from darkness to light! How worth- 
less to them will seem the honours and glories of the den! 
But now imagine further, that they descend into their old 
habitations;— in that underground dwelling they will not 
see as well as their fellows, and will not be able to compete 
with them in the measurement of the shadows on the wall 
... Now the cave or den is the world of sight, the fire is 
the sun, the way upwards is the way to knowledge, and in 
the world of knowledge the idea of good is last seen and 
with difficulty, but when seen is inferred to be the author 
of good and right. [1888, pp. xcvili—xcix] 


The allegory was rehearsed by Bacon in his Novum Organum of 1620, 
where we read 


The Idols [i.e. illusions or false appearances] of the Cave 
are the Idols of the individual man. For every one (besides 
the errors common to human nature in general) has a cave 
or den of his own, which refracts and discolours the light 
of nature; owing either to his own proper and peculiar na- 
ture; or to his education and conversation with others; or to 
the reading of books, and the authority of those whom he 
esteems and admires; or to the differences of impressions, 
accordingly as they take place in a mind preoccupied and 
predisposed or in a mind indifferent and settled; or the like. 
So that the spirit of man (according as it is meted out to 
different individuals) is in fact a thing variable and full of 
perturbation, and governed as it were by chance. 


[Aphorism XLII] 


This translation of Bacon’s original Latin text is by Robert Leslie El- 
lis, some of whose statistical work will be discussed in a later chapter. 
4. See Romanov [1974], Sabatier [1978] and Talenti [1986]. 
5. For a good discussion of moment problems see Shohat & Tamarkin 
[1943]. 
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. For further discussion of the rdle of inverse probability in error mea- 
surement see Edgeworth [1911]. 

. For instance Salmon [1966, p. 118] finds that plausibility arguments 
not only are an essential ingredient in scientific inference, but also 
“embody considerations relevant to the evaluation of prior probabili- 
ties” (Salmon’s views on this matter have been discussed, somewhat 
slightingly, by Weimer [1975]). For a discussion of mathematical mod- 
els of uncertainty and indeterminacy see Walley [1991, §1.8]. 

. See also §5.14. 


. The original reads as follows: 


At hic tandem nobis aqua herere videtur, cum vix in pau- 
cissimis preestare hoc liceat, nec alibi fere succedat. quam 
in alez ludis, quos primi inventores ad zequitatem ipsis 
conciliandam data opera sic instituerunt, ut certi notique 
essent numeri casuum, ad quos sequi debet lucrum aut 
damnum, & ut casus hi omnes pari facilitate obtingere 
possent. In czeteris enim plerisque vel a naturz operatione 
vel ab hominum arbitrio pendentibus effectis id neutiquam 
locum habet. [p. 223] 


10. Bernoulli’s words are 


Verum enimvero alia hic nobis via suppetit, qua quaesitum 
obtineamus; & quod a priori elicere non datur, saltem 4 pos- 
tertori, hoc est, ex eventu in similibus exemplis multoties 
observato eruere licebit; quandoquidem przesumi debet, tot 
casibus unumquodque posthac contingere & non contingere 
posse, quoties id antehac in simili rerum statu contigisse & 
non contigisse fuerit deprehensum. [p. 224] 


11. The original runs as follows: 


Hoc igitur est illud Problema, quod evulgandum hoc loco 
proposui, postquam jam per vicennium pressi, & cujus tum 
novitas, tum summa utilitas cum pari conjuncta difficul- 
tate omnibus reliquis hujus doctrinz capitibus pondus & 
pretium superaddere potest. [p. 227] 


Both David [1962, p. 136] and Sung [1966, p. 42] translate “super- 
addere” as “exceed”, the impression thus being conveyed that the 
novelty, &c. of the problem exceed the value of the rest of the work. 
The translation I have given here, which I believe to be more ap- 
propriate, agrees with that given by de Moivre [1756, p. 254] and 
Haussner (1899, II, p. 92]. 


924 


iD: 


13. 
14. 
15. 


16. 
UD, 


18. 


19. 


20. 


21. 
22, 


23. 


Notes: chapter 1 


This passage runs as follows in the original: 


Sit igitur numerus casuum fertilium ad numerum steril- 
ium vel preecisé vel proxime in ratione r/s, adeoque ad 
numerum omnium in ratione r/(r +s) seu r/t, quam ra- 
tionem terminent limites (r+1)/t & (r—1)/t. Ostendendum 
est, tot posse capi experimenta, ut datis quotlibet (puta 
c) vicibus verisimilius evadat, numerum fertilium observa- 
tionum intra hos limites quam extra casurum esse, h. e. 
numerum fertilium ad numerum omnium observationum 
rationem habiturum nec majorem quam (r+ 1)/t, nec mi- 
norem quam (r—1)/t. [p. 236] 


Compare Sung [1966, p. 4]. 
See also David [1962, p. 137]. 
The passage translated here is 


Unde tandem hoc singulare sequi videtur, quod si eventuum 
omnium observationes per totam zternitatem continuaren- 
tur, (probabilitate ultimo in perfectam certitudinem abe- 
unte) omnia in mundo certis rationibus & constanti vicis- 
situdinis lege contingere deprehenderentur; adeo ut etiam 
in maximé casualibus atque fortuitis quandam quasi neces- 
sitatem, &, ut sic dicam, fatalitatem agnoscere teneamur. 


[p. 239] 


See Hacking [1975, pp. 149, 154]. 

The presence of this passage was drawn to the attention of the sta- 
tistical community by S.M. Stigler in 1983. 

The presence of the expectation operator here rather than the prob- 
abilistic one is of no moment. 

The distinction was, it would appear, also noted by Good [1959], 
who wrote of Bayes’s attempt at an inversion of Bernoulli’s Theorem. 
Stigler [1986a, p. 100] regards the inversion of the results of Bernoulli 
and de Moivre as “The chief conceptual step taken in the eighteenth 
century toward the application of probability to quantitative infer- 
ence”. 

The sketch illustrating the situation detailed in the following quota- 
tion may be found in Chapter 4, Figure 4.1. 

Each of these symptoms may be viewed as a complex of symptoms. 

In translation: “O happy he who can determine the causes of events!” 
Commentators agree in stating that Virgil is following Lucretius here. 
A similar sentiment was expressed in Matthew Arnold’s Memorial 
Verses of 1850: “And he [i.e. Goethe] was happy, if to know, causes 
of things ...”. 

The connexion is also considered in Keynes [1921, pp. 401-402, 420- 
422]. 


24. 


20. 


26. 
Ze 
28. 
29. 
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The idea of Pr[p|q], the probability of p on data q, is taken as funda- 
mental by Jeffreys [1961, p. 15]. 

Perks’s paper of 1947 is an important one in the annals of inverse 
probability: not only are exceedingly pertinent comments made on 
this topic, but a new indifference rule is proposed. 

See Robert [1994, p. 370]. 

See O’Hagan [1994, p. 134]. 

Seidenfeld [1979, p. 19]. 

The name of this “eminent professor” is not given, although a footnote 


to this page gives the other lecturers as Dr Venn, Professor Weldon 
and Sir Robert Ball. 


Chapter 2 


ie 


2. 


This description is from Hacking’s introduction to the English trans- 
lation of Maistrov [1974, p. vii]. 

Forsaking, in this respect, what is described in the 14th edition [1939] 
of the Encyclopedia Britannica as its “career of plain usefulness” 
[vol. 3, p. 596]. However a supplementary volume, The Dictionary of 


National Biography: Missing Persons of 1993, does contain a note, 
by A.W.F. Edwards, on Thomas Bayes. 


. See Barnard [1958, p. 293]. 
. The note, by Thomas Fisher, LL.D., reads in its entirety as follows: 


“Bayes, Thomas, a presbyterian minister, for some time assistant to 
his father, Joshua Bayes, but afterwards settled as pastor of a con- 
gregation at Tunbridge Wells, where he died, April 17, 1761. He was 
F.R.S., and distinguished as a mathematician. He took part in the 
controversy on fluxions against Bishop Berkeley, by publishing an 
anonymous pamphlet, entitled “An Introduction to the Doctrine of 
Fluxions, and Defence of the Mathematicians against the Author of 
the Analyst,” London, 1736, 8vo. He is the author of two mathemati- 
cal papers in the Philosophical Transactions. An anonymous tract by 
him, under the title of “Divine Benevolence” , in reply to one on Divine 
Rectitude, by John Balguy, likewise anonymous, attracted much at- 
tention.” For further comments on this reference see Anderson [1941, 
p. 161]. The Winkler Prins Encyclopaedie [1948], in the entry under 
“Bayes, Thomas”, has only this to say by way of biography: “Engels 
wiskundige (?-1763), omtrent wiens leven wij vrijwel niets weten” 
[vol. 3, p. 396]. It does, however, give a clear and correct discussion 
of Bayes’s Theorem. 


. Anderson [1941, p. 160]. If Thomas was born before 25th March (New 


Year’s Day in the old English calendar), the year could be 1701 (old 
style), though it would be 1702 (new style). The day of his death 
being the seventh, we find on subtracting the 11 days lost by the 
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calendar reformation of 1752, that his death was on the 27th March 
1761 (o.s.). Subtraction of his age at death — viz. 59 — shows that 
he was either born on or just after New Year’s Day 1702 (0.s.), or in 
1701 (0.s.). See Bellhouse [1988a]. 


. Hacking [1970] and the 15th edition of the Encyclopedia Britannica 


[1980], (vol. I of the Micropedia). 


. Holland [1962, p. 451]. 
. Barnard [1964], Hacking [1970]. 
. Holland [1962, p. 452], Pearson (1978, p. 355] and Wilson [1814]. 


Bogue and Bennett [vol. II, 1809] point out that “The necessities of 
the church may render it proper that men should be ministers, who 
have not enjoyed the advantages of an academical, or even a liberal 
education” [p. 7]. 

Pearson [1978, p. 356] states that this academy was founded by the 
Congregational Fund Board in 1695. 

Writing of the course of instruction for the Christian ministry, Bogue 
and Bennett [vol. III, 1810] say “To mathematics and natural phi- 
losophy it has usually been judged proper to apply a portion of the 
student’s time. As they tend to improve the mind, and peculiarly to 
exercise its powers, and call forth their energies, the general influence 
of both may be favourable to his future labours, and the hearers as 
well as the preacher experience their good effects” [p. 270]. 

H.M. Walker, in her biographical notes appended to the 1967 reprint 
of de Moivre’s The Doctrine of Chances, writes “... Thomas Bayes, 
with whom he [{i.e. de Moivre] is not known to have been associated” 
[p. 367]. An outrageous statement is made by Epstein [1967]: speak- 
ing of de Moivre, he writes “His mathematics classes were held at 
Slaughter’s Coffee House in St. Martin’s Lane: one successful student 
was Thomas Bayes” [p. 5]. In similarly bold vein Arne Fisher [1926, 
p. 13] describes Bayes as an “Oxford clergyman”. 

Holland [1962, p. 452]. Proving how little times have changed since 
then, Kac [1985] in his autobiography says “The way Stan [Ulam] 
did mathematics was by talking, a work style which goes back to 
his young days in Lwow, which were spent largely in coffee houses 
(mainly in Szocka, which is Polish for “Scottish Café”) endlessly dis- 
cussing problems, ideas and conjectures. Great stuff came of this 
highly unorthodox way of doing mathematics ... ” [pp. xx—xxi]. See 
also Ciesielski [1987]. 

See Dale [1990]. 

Holland [1962, p. 453]. See also James [1867]. 

James [1867]. 

Holland [1962, p. 453] states that this move of Thomas’s was made in 
1731, while Barnard [1958, p. 293] merely notes that “he was certainly 
there in 1731”. See also Pearson [1978, p. 357] and Timpson [1859, 


18. 


19. 


20. 


21. 
22. 


23. 


24. 
20. 
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p. 464]. However, in the Minute Books of the Body of Protestant Dis- 
senting Ministers of the Three Denominations in and about the Cities 
of London and Westminster the following entry may be found: “Oct. 
3'4_ 1732. List of approved Ministers of the Presbyterian denomina- 
Mr. Bayes Sen’. 
Mr. Bayes Jun’. 
that Thomas Bayes moved to Tunbridge Wells somewhat later than 
the usually cited date of 1731. 

Barnard [1958, p. 293]. Burr points out further that “a Methodist 
meeting-house has also been erected at Tunbridge-Wells since the 
rise of that deluded sect” [1766, p. 104]. 

The following descriptions of Tunbridge Wells are from Burr [1766]: 
“Tunbridge-Wells is situated on the southern side of the county of 
Kent, just on the borders of Sussex, and about thirty-six miles from 
London. It is partly built in Tunbridge parish, partly in Frant parish, 
and partly in Speldhurst parish; and consists of four little villages, 
named Mount-Ephraim, Mount-Pleasant, Mount-Sion, and the Wells” 
[pp. 98-99]. “An excellent bowling-green, the old assembly-room, and 
a capacious handsome Presbyterian meeting-house, are all situated 
upon Mount-Sion” [p. 104]. Further details may be found in Holland 
[1962, pp. 453-454]. 

Some doubt as to Ditton’s religious convictions exists. The Imperial 
Dictionary of Universal Biography says “The son [i.e. Humphrey], in 
opposition to the nonconformist wishes of the father [also Humphrey], 
entered the English church.” On the other hand, in the Dictionary 
of National Biography we find “The younger Ditton afterwards be- 
came a dissenting preacher at his father’s desire’. The New General 
Biographical Dictionary is more cautious and merely states “he [i.e. 
Humphrey] at the desire of his father, although contrary to his own 
inclination, engaged in the profession of divinity”. | 

Holland [1962, p. 455}. 

Cajori [1919a] writes “the publication of Berkeley’s Analyst was the 
most spectacular mathematical event of the eighteenth century in 
England” [p. 219}. 

Pearson [1978, p. 360] notes that Jurin was the secretary of the Royal 
Society: in fact, Jurin occupied this position jointly with John Machin 
from 1721 to 1727. 

For further details see Pearson [1978, p. 360]. 

Extracts from this tract are given in Holland [1962, pp. 455-456]: for 
a detailed discussion of the paper see Smith [1980]. The Dictionary of 
Anonymous and Pseudonymous English Literature cites de Morgan 
as its source of information on the authorship of this work. 

Holland [1962] writes “His 1736 tract in defence of the mathematical 
art was published at an opportune time and was of such merit that 
it is likely that his election was unanimously agreed to by members 


tion. \ Leather Lane.” I would suggest therefore 


528 


ra 
28. 


29. 


30. 


31. 


32. 


Notes: chapter 2 


of this distinguished body ... We do not believe he obtained election 
in the manner of some members who contributed nothing of merit 
but were wealthy enough to pay the high admittance fees and yearly 
dues” [p. 459]. See also Hacking [1970, p. 531], Maistrov [1974, p. 88], 
Pearson [1978, p. 356], Keynes [1973, p. 192], Pearson [1978, p. 350] 
and Timerding [1908, p. 44]. The last three of these references incor- 
rectly give Bayes’s date of election as 1741. Elected at the same time 
as Bayes were Walter Bowman and Michel Fourmont: see Signatures 
in the First Journal-Book and the Charter-Book of the Royal Society 
[1912]. 

Quoted in Holland [1962, p. 459]. 

One trusts that this last word is not used with the meaning that 
Chambers Twentieth Century Dictionary assures one it has in book- 
sellers’ catalogues! 

Reprinted in Pearson [1978, p. 357]: for details of the signatories see 
pp. 357-358, op. cit. 

Pearson [1978, pp. 360-361]. That Bayes (and his clerical coevals) 
should have had time to indulge in mathematical and scientific pur- 
suits is hardly surprising when one bears in mind the following pas- 
sage from an editor of Derham’s Physico-Theology (first issued in 
1713): “The life of a country clergyman is in every respect more 
favourable to the cultivation of natural science, by experiment and 
observation, than any other professional employment. He has all the 
leisure that is requisite to philosophic researches; he can watch the 
success of his experiments from day to day, and institute long pro- 
cesses without interruption, or record his observations without chasm 
or discontinuation.” Not that their professional duties were ignored, 
however: Bogue and Bennett write “As to the quantity of labour per- 
formed by dissenting ministers of evangelical principles (the religious 
principles of the old nonconformists), they need not blush at a com- 
parison with those of the preceding times. ‘To the two public services 
of former times, a third has now been generally added, and evening 
lectures are become in most congregations the stated practice. In the 
course of the week too, there is a public season for worship in one 
of the evenings, so that the minister has to preach four times from 
Sabbath to Sabbath” [vol. IV, 1812, pp. 343-344]. 

This notebook, although strictly speaking to be numbered among the 
adespota to be attributed to Bayes, bears on its first page the hand- 
written words “This book appears to be a mathematical notebook by 
Rev. Thomas Bayes, F.R.S.. The handwriting agrees very well with 
papers by him in the Canton papers of the Royal Society Vol. 2, 
p. 32.” This note is dated 21-1-1947 and is signed by M.E. Ogborn. 

Holland (1962, pp. 456-459]. Richard Price (1723-1791) was a dis- 
senting minister, mathematician and political economist. His actuar- 
ial work led to his election as Fellow of the Royal Society in 1765, and 
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the honorary degree of D.D. was awarded him on the 7th August 1767 
by Marischal College, Aberdeen (not by the University of Glasgow, 
as sometimes stated). An LL.D. from Yale followed in 1781. William 
Morgan (1750-1833) was the son of Price’s sister Sarah. His actuarial 
writings led to the award of the gold medal of the Royal Society and 
a fellowship. 

The key is to Aulay Macaulay’s system. The shorthand actually used 
by Bayes in the notebook has been identified as being basically that 
derived in the 17th century by Thomas Shelton and modified by El- 
isha Coles. See Holland [1962, p. 458] and Home [1974-1975, p. 83]. 
See Archibald [1926], Molina and Deming [1940], Pearson [1924a, 
p. 404] and Pearson [1978, p. 358]. For details of the relative contribu- 
tions of de Moivre and Stirling see Tweedie [1922, pp. 203-204], and, 
for a detailed examination of de Moivre’s work, see Schneider [1968]. 
De Moivre wrote a short paper on the approximation of the greatest 
term of (a + 6)” as a series. This paper, originally printed on 12th 
November, 1733, was later translated, with only minor alterations, in 
the second [1738] and third [1756] edition of de Moivre’s The Doctrine 
of Chances. The first printing had only a limited circulation: indeed, 
de Moivre prefaced his later translation with the words “T shall here 
translate a Paper of mine which was printed November 12, 1733, and 
communicated to some Friends, but never yet made public, reserving 
to myself the right of enlarging my own Thoughts, as occasion shall 
require” [1756, p. 242]. If Barnard’s speculation that Thomas Bayes 
might have learned mathematics from de Moivre is correct [Barnard 
1958, p. 293], it is tempting to conjecture further that Bayes might 
have been one of the privileged circle to see the 1733 Approzimatio ad 
Summam Terminorum Binomii (a+6)” in Seriem expansi. However, 
several papers on infinite series had been published in the Philosophi- 
cal Transactions before Bayes’s paper on this subject (communicated 
1761, published 1764), so one should perhaps not rely too much on the 
possible friendship between de Moivre and Bayes as an explanation 
of the latter’s consideration of “Stirling’s Theorem”. 

In Morgan’s biography [1815] of Price may be found the following 
remarks: “it [i.e. Bayes’s Essay] was presented by Mr. Canton to the 
Royal Society, and published in their Transactions in 1763. — Having 
sent a copy of his paper to Dr. Franklin, who was then in America, he 
(i.e. Price] had the satisfaction of witnessing its insertion the following 
year in the American Philosophical Transactions.” The records of 
the American Philosophical Society for 1762-1766 have apparently 
been lost, and recent research has failed to find any record of this 
alleged communication by Price to Franklin. It seems that on this 
point Morgan erred. Walter Ashburner, a direct descendant of Price’s 
sister, in a memorandum sent to the president of the Massachusetts 
Historical Society in 1903 (see Letters to and from Richard Price) in 
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fact wrote “William Morgan was a distinguished mathematician ... 
but he was not a good biographer” [p. 4]. Thomas [1924] describes 
Morgan’s Memoirs of Price as “inadequate, and, unfortunately, often 
inaccurate” [p. iii]. For biographical notes on Price see Bogue and 
Bennett, [vol. IV, 1812, pp. 421-425] and Holland [1968]; a full-length 
biography is Laboucheix [1970]. 

See Clay [1895] and Leader [1897]. 

See Pearson [1978, p. 355] and Stephen [1885]. 

The saint’s day seems to be a movable feast: the 14th edition [1939] of 
the Encyclopedia Britannica, the source of the quotation cited, gives 
it variously also as August 24th (1666) and 12th; W.B. Forbush’s 
edition of For’s Book of Martyrs [1926] gives it as August 22nd, while 
Charles Dickens, in his A Child’s History of England, gives August 
23rd. Haydn’s Dictionary of Dates and Universal Information (ed. B. 
Vincent) [1904] has 24th August (old style) and 3rd September (new 
style). 

Encyclopedia Britannica, 14th edition [1939, vol. 8, p. 470]. Ejected 
at the same time from “a good living at Moreton, in Essex, near 
Chipping Ongar” was the father of Edmund Calamy (see Calamy 
[1830, vol. I, p. 65]). For further details of this ejectment see the 
entry “Nonconformity” in Hastings [1967]. 

Enclosures in [...] are the present author’s. 

The description is from Wilson [1814]; for further details of Frank- 
land see Holland [1962, p. 452]. Bogue and Bennett [vol.I, 1808, 
p. 225] describe Frankland as “an eminent dissenting tutor, who 
taught university learning.” 

The distinction between such dissenting academies and the dissenting 
schools of that period is succinctly discussed in Holland [1962, p. 452]: 
see also Dale [1907]. Parker [1914] notes that while the Dissenting 
Schools were charity foundations, the Dissenting Academies “were 
schools of university standing” [p. 50]. Indeed, Hans states that “The 
Dissenting communities established an efficient substitute for Univer- 
sity education in their famous Academies, which combined theological 
with scientific training and produced many outstanding men of the 
eighteenth century” [1951, p. 15]. Further, in writing of the numerical 
significance of the dissenting academies, Hans remarks (op. cit.) that 
in a list he had drawn up from the Dictionary of National Biography 
of 3,500 men born between 1685 and 1785 who received any formal 
education in any school, “The total number of selected men produced 
by Dissenting Schools and Academies was 265, or about 10 per cent 
of all English cases, which was far above their relative strength in the 
total population of England in the eighteenth century” [p. 20]. 
Pearson [1978, p. 355]. Parker [1914] points out that, on Frankland’s 
death in 1698, his academy at Rathmell declined: it was succeeded 
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by one under Chorlton’s tutorage. Parker [1914, p. 121] finds no jus- 
tification for the claim (see, for example, Holland [1962, p. 452]) that 
Chorlton’s, and thus Frankland’s, Academy may be viewed as one of 
the forerunners of Manchester College, Oxford. For further details of 
Frankland’s Academy see Bogue and Bennett [{vol. I, 1808]. 

Wilson [1814, p. 396]. 

The others were Joseph Bennett, Thomas Reynolds, Joseph Hill, 
William King, Ebenezer Bradshaw and Edmund Calamy. Bradshaw 
and Calamy were the sons of ejected ministers (see Calamy [1830]). 
Pearson [1978, p. 355]. See also Barnard [1964] and the more correct 
Barnard [1958, p. 293]. The latter, however, mentions six, rather than 
seven, ordinees. 

Stephen [1885]. The following details are from Calamy [1830]: the 
ordainers on this occasion were Dr. Samuel Annesley, Mr. Vincent 
Alsop, Mr. Daniel Williams, Mr. Richard Stretton, Mr. Matthew 
Sylvester, and Mr. Thomas Kentish. The proceedings opened with 
a prayer by Dr. Annesley, followed by Mr. Alsop’s preaching from 
1 Peter v.1, 2,3. Mr. Williams then prayed, made a discourse concern- 
ing the nature of Ordination, and read the names and testimonials of 
those to be ordained. Confessions of faith on the part of the latter and 
prayers then followed, the whole concluding with a solemn charge, a 
psalm, and a prayer. “The whole,” according to Calamy [1830, p. 350], 
“took up all the day, from before ten to past six o’clock.” Before being 
ordained, each candidate had to defend a thesis upon a theological 
question, the several ministers present warmly opposing it. Joshua 
Bayes’s question was “An Deus sit Essentia sua omnipresens?” Aff. 
See also Bogue and Bennett [vol. IT, 1809, pp. 121-122]. 

Calamy [1830]. 

In Southwark and Leather Lane, according to Pearson [1978, p. 355], 
though Wilson [1814] is more restrained and merely writes “It does 
not appear where Mr. Bayes spent the first years of his ministry, 
but it was, most probably, in the neighbourhood of London” [p. 397]. 
These peregrinations are not mentioned by Holland [1962] who writes 
“Joshua was ordained in 1694... at Little St. Helen’s Meeting House 
and was the minister at Box Lane, Bovingdon, Herts., until 1706” 
[p. 451]. 

Joshua succeeded Mr Edmund Batson (Wilson [1814, p. 312]). 
Stephen [1885]. Sheffield died in 1726 (Calamy [1830, vol. II, p. 487]). 
Pearson [1978] refers to this gentleman as “Brook Taylor” [p. 355): 
this is clearly one of the slips that he would indubitably have cor- 
rected had he prepared his lectures for publication (see the preface 
to Pearson [1978}). 

On early English presbyterianism see Anderson [1941, p. 160]. 
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Pearson [1978, p. 355] and Stephen [1885]. For details of the other 
ministers involved in this work see Bogue and Bennett [vol. II, 1809, 
p. 297]. 


. James [1867], Stephen [1885] and Wilson [1814]. 
. James [1867]. 
. James [1867]. Joshua was deemed a Calvinist, “that is, such as agree 


with the Assembly’s Catechism”, and Thomas an Armenian, “or such 
as are far gone that way, by which are meant such as are against par- 
ticular election and redemption, original sin at least the Imputation 
of it, for the power of man’s will in opposition to efficatious Grace, 
and for Justification by sincere obedience in the room of Christ’s 
righteousness &c.” (The quotations are from Anon (b), pp. 87 & 88.) 
For details of the Salters’ Company see the 14th edition [1939] of 
the Encyclopedia Britannica [vol. 14, pp. 236-237]. The Merchants’ 
Lecture was originally established in 1672 in Pinners’ Hall, Broad 
Street, but after an attack on heresies by Daniel Williams in one of 
his lectures, a Presbyterian Lectureship was set up at Salters’ Hall in 
1694 (Dale [1907, p. 481]). 

Pearson [1978, p. 355], Stephen [1885] and Wilson [1814]. 


[1970, p. 531] and Maistrov (1974, p. 88]. Holland [1962, p. 452] points 
out the error, and his assertion is vindicated by the Signatures in the 
First Journal-Book and the Charter-Book of the Royal Soctety [1912]. 
Wilson [1814] writes “the inscription upon his tomb-stone says, in 
his 52nd year, but it is evidently a mistake” [p. 398]. The vault, 
in which the mortal remains of Thomas and other members of the 
Bayes family were also interred, and that had fallen into disrepair, 
was restored in 1969, the erroneous phrase being omitted from the 
inscription. An engraving of Joshua Bayes may be found in Wilson 
[1814]: it is copied from the portrait in Dr Williams’s Library. 
Wilson [1814]. 

This cemetery is referred to by Calamy [1830] as “... the new burial 
place for Dissenters, by Bunhill Fields, near London ... ”. Richard 
Price and his wife Sarah are also buried here: their tomb, hike many 
others here, is sorely in need of restoration. Hicks [1887] has suggested 
that the original name of the burial ground was Bon- or Bone-hill 
Fields; this is disputed by others. 

James [1867, p. 670] describes the congregation as “mainly trades- 
men”. Bogue and Bennett [vol. III, 1810, p. 495] write “Among the 
churches in London, the first rank of respectability was assigned to 
... Joshua Bayes”. Joshua played an active role in the nonconformist 
circles of his time, serving —- sometimes as chairman — on the com- 
mittee of the three denominations (Presbyterian, Congregational and 
Antipzedobaptist) that saw to many matters, in and around London, 
pertaining to nonconformity (see Minute Books of the Body...). 
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Further comment on Bayes’s meeting house is given in Anon (b), as 
follows: “This meeting house is about 15 square of building, with 
3 Galleries. In 1695, Mr. Buris was minister to this people, but not 
living many yeares after that time Mr. Christopher ‘Taylor was chosen 
Pastor in his room. he was accounted a G'man [= gentleman] of a 
bold spirit & a good preacher & about 1714 Mr. Bayes was chosen to 
assist him. Mr. Taylor dying about 1724 Mr. Bayes succeeded as pas- 
tor, & since that time Mr. Bayes Jun’ was chosen to assist his father. 
This congregation was never large. but were a people generally of sub- 
stance. It does not certainly appear what difference there is between 
the congregation in 1695 & the present, tho it is apprehended to be 
somewhat less. Mr. Bayes is a judicious serious and exact preacher 
and his composures appear to be laboured. He is of a good temper 
& well esteemed by his brethren. Mr. Bayes is a lecturer at Salter’s 
Hall” [p. 35]. It is also recorded here that “his congregation collects 
£100 annually for a fund to assist country ministers” [p. 89]. 
Holland [1962]: Rebecca’s name does not appear on the restored 
Bayes-Cotton vault in Bunhill Fields, while the year of Ann’s death, 
quoted here from Clay [1895], is given on the restored vault as 1758: 
the change might well have come about at the time of the restoration. 
The birth-years given here are found by subtracting the age at death 
from the year of death. 

On the 22nd September, according to the Bunhill Fields tombstone; 
but the obituary in The Gentleman’s Magazine ... 31 [1761, p. 188] 
has “Aug. 16. At Brighthelmstone, Mrs. Bayes, wife of Sam. Bayes 
esq. of Clapham”. 

This tract is discussed in Pearson [1978, p. 359 et seqq.]. Quotations 
are given in Barnard [1958, p. 294] and Holland [1962, pp. 454-455]. It 
was apparently unknown to de Morgan: see his [1860]. On probability 
as a guide in religious matters as in secular affairs see Shapiro [1983, 
p. 80]. 

Writing of Henry Grove, Bogue and Bennett [vol. HI, 1810] state 
that his “theological learning was considerable, and his attainments 
in polite literature were superior to those of most of his brethren... . 
Unhappily Mr. Grove was not sound in the faith; and as he advanced 
in years, he contracted a more keen and rooted aversion to evan- 
gelical doctrines” [p. 275]. Bogue and Bennett (loc. cit. and vol. I) 
also mention that Grove was a tutor in pneumonology and ethics 
at an academy formed by Matthew Warren at Taunton, Somerset. 
On Robert Darch’s resignation, mathematics and natural philosophy 
were added to Grove’s department, and on Stephen James’s resigna- 
tion in 1725, he was appointed to the chair of divinity. 

The Dictionary of Anonymous and Pseudonymous English Literature 
(Halkett and Laing [1926]) attributes Divine Benevolence to Thomas 
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Bayes, citing as authority Darling’s Cyclopaedia Bibliographica [1852- 
1854]. The National Union Catalogue, on the other hand, attributes 
it to Joshua Bayes. The definitive statement is perhaps however made 
by Price [1787], who, in a footnote on page 429 of his paper, states 
that “The author [of Divine Benevolence] was Mr. Bayes, one of the 
most ingenious men I ever knew, and for many years the minister of 
a dissenting congregation at Tunbridge Wells.” 

Holland [1962] describes Bayes’s defence as “the most scathing reply” 
[p. 455]. 

Anderson [1941, pp. 160-161] and Wilson [1814, p. 401]. 

For comments on Whiston’s (curious) views on the universe see Gard- 
ner [1957, pp. 33-34]; for details of Whiston see Anderson [1941, 
p. 160], Holland (1962, p. 454] and Pearson [1978]. The application of 
Whiston’s Newtonian biblical interpretation to social, political and 
theological issues in the context of the Newtonian movement is ex- 
plored in Force [1985]. 

Commenting on Whiston’s rapture with the writings of the early 
fathers, Bogue and Bennett [vol. III, 1810, p. 216] say “Nothing more 
is necessary to characterize the man.” 

Barnard [1958, p. 294]. 

Pearson [1978] and Whiston [1753, pp. 325-326]. 

Pearson [1978, pp. 349-350]. 

The amount a clergyman of that period could earn is mentioned in 
Henry Fielding’s Joseph Andrews of 1742, where the curate Abraham 
Adams, at the age of fifty, “was provided with a handsome income 
of twenty-three pounds a year” {chap. III] (by the end of the book, 
however, Mr. Adams has been presented “with a living of one hundred 
and thirty pounds a year”). Somewhat later, Goldsmith describes the 
village preacher in The Deserted Village as follows: 


A man he was, to all the country dear, 
And passing rich with forty pounds a year. 


(As a comparison, note that this poem was itself published in 1770 
at a cost of two shillings!) 

The formation of the Independent church is reported by Timpson 
[1859, p. 466] as follows: “Having heard of a faithful dissenting minis- 
ter at Goudhurst, they [i.e. Thomas Baker, Edward Jarrett, an aged 
man, named Bunce, and Robert Jenner] went one Lord’s day, May 
21st, 1749, to hear him. Delighted with his sermon, they conversed 
with him, and he informed them that the Rev. Mr. Jenkins would 
be ordained pastor of the Independent church at Maidstone on the 
following Wednesday. They went to that service, and became thus ac- 
quainted with the Rev. Mordecai Andrews, of London; and he came 
with Mr. Booth, for the season, to the Wells, where he engaged the 
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Presbyterian chapel, from the Rev. Mr. Bayes, its minister. They en- 
joyed the gospel preached by ministers sent from London for nearly 
a year, until Easter Sunday, in 1750, when Mr. Bayes resumed his 
pulpit, disliking the doctrine of the Independents, and they again at- 
tended at the Established Church, for the sake of the Lord’s Supper.” 
The time-scale has been differently recorded: see Anon (a), “Early 
Presbyterianism at Tunbridge Wells”, where it is stated that the 
dissenters used Bayes’s meeting-house from 1743 to 1750. Thomas 
[1924], quoting Drysdale’s History of the Presbyterians in England 
writes “Presbyterian ministers were, as a whole, much more dignified 
and clerical in tone than their Independent brethren” [p. 25]. 

See Holland [1962, p. 456] and Timpson [1859, p. 464]. The latter 
writes of Bayes “he was a gentleman of fortune; but though he was 
said by the Rev. Mr. Onely, a clergyman of Speldhurst, to have been 
the best Greek scholar he had ever met with, he was not a popular 
preacher, nor evangelical in his doctrine.” 

Was this the original “Disgusted, of Tunbridge Wells”? 

Holland [1962, p. 456]. The quotation is from the “Church Book” 
of the Independents, which, according to Miss J. Mauldon of the 
Tunbridge Wells Library, has disappeared. 

Timpson [1859] writes that Bayes “bequeathed his valuable library to 
his successor, the Rev. William Johnson, M.A., who became minister 
of the chapel in 1752” [p. 464]. This statement has been repeated by 
Barnard [1958, p. 294], Holland [1962, p. 459], and Strange [1949, 
p. 17], but there is no mention of such a bequest in Bayes’s last will 
and testament. 

He directed, in his will, that his funeral expenses “may be as frugal as 
possible” [Holland 1962, p. 459]. The date of his death is given vari- 
ously as the 7th (The Gentleman’s Magazine and Historical Chronicle 
XXI (1761), p. 188), the 14th (The London Magazine, or Gentle- 
man’s Monthly Intelligencer XXX (1761), p. 220) and the 17th (Rose 
[1848]). See also Anderson [1941, p. 162], Jones [1849, p. 8], Waller 
[1865] and Wilson [1814]. According to Jones [1849], the original in- 
scription read “The Rev. Thomas Bayes, son of the said Joshua, died 
April 7th, 1761, aged 59 years.” What is reputed to be a portrait 
of Thomas Bayes may be found on page 335 of O’Donnell [1936], 
above the legend “Rev. T. Bayes Improver of the Columnar Method 
developed by Barrett.” No reference to the source from which the 
portrait is taken is given, and O’Donnell is elsewhere unreliable (see 
Dale [1988a], and, for further comment, Bellhouse [1988a], O’Hagan 
[1988] and Stigler [1988]). The portrait was reprinted in Press [1989] 
and Stigler [1980a]. 

This vault, having fallen into disrepair, was restored “In recognition 
of Thomas Bayes’s important work in probability ... in 1969 with 
contributions received from statisticians throughout the world.” In 
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1988 O’Hagan noted that the tomb, after being sadly weathered, was 
once again in good condition: I am sad to relate that on my last visit 
(in 1996) I again found it a sorry state. 

According to Holland [1962, p. 452], Coward’s (later known as the 
Hoxton Academy) was the only academy in the London area from 
1716 to 1730. For further details of the Hoxton Academy see McLach- 
lan [1931, pp. 18, 118, 120]. 

Writing of Chauncy, Bogue and Bennett [vol. I, 1809, p. 35] say 
“Though a learned divine, he was not a popular preacher”. 
Although commenting with approval on Ridgeley’s suitability as a 
theological tutor, Bogue and Bennett [vol. III, 1810] cannot forbear 
from noting that “had [Ridgeley’s] style but possessed neatness, ele- 
gance, and force, what an additional value it would have imparted to 
his ample treasures of sacred truth” [p. 283]. Ridgeley died in 1734, 
in his 67th year. 

Dale [1907, p. 501] writes “Eames, though distinguished as a scholar, 
was disabled for the ministry by a defect in the organs of speech, and 
by a pronunciation that was ‘harsh, uncouth, and disagreeable’. He 
once attempted to preach, but broke down, and never repeated the 
experiment.” Bogue and Bennett [vol. III, 1810] more sympathetically 
merely say “extreme diffidence and a defect in the powers of elocution 
deterred him from preaching more than one sermon” [p. 284]. Accord- 
ing to these authors (loc. cit.) Dr. Isaac Watts once remarked to one 
of his students “your tutor [i.e. Eames] is the most learned man | 
ever knew.” McLachlan [1931] writes of Eames that “he was the only 
layman ever placed in charge of an academy, and, unlike most other 
tutors, published nothing. He was eminent alike in classics and math- 
ematics, attracted to his lectures, despite a lack of oratorical gifts, 
some of the most promising pupils of other academies, and after his 
death his lectures continued to be used in manuscript by tutors of 
academies other than his own ” [p. 18]. Eames died suddenly in June 
1744. 

The rules of Doddridge’s Academy are listed in Appendix III in 
Parker [1914]. Bogue and Bennett [vol. III, 1810] regard this Academy 
as a revival of one established at Kibworth, Leicestershire, by John 
Jennings in 1715, and temporarily suspended on his death. Doddridge 
had in fact been a student at Jenning’s Academy. Bogue and Bennett 
(op. cit., p. 480) criticize Doddridge’s lectures for having “a tendency 
to generate a controversial spirit”, and add further [p. 482] that “Asa 
man, he [i.e. Doddridge] cannot be said to have been endued with ge- 
nius in the highest sense, nor was his learning very profound, though 
it was extensive, rendering him respectable rather than eminent.” 
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Chapter 3 


1. See Brewer, The Dictionary of Phrase and Fable (1978, p. 938]. 

2. This pudency (or prudency?) seems first to have been noticed by 
William Morgan [1815, p. 24], who wrote “On the death of his friend 
Mr. Bayes of Tunbridge Wells in the year 1761 he [i.e. Price] was 
requested by the relatives of that truly ingenious man, to examine 
the papers which he had written on different subjects, and which 
his own modesty would never suffer him to make public.” Hacking 
(1965, p. 201] writes “Cautious Bayes refrained from publishing his 
paper; his wise executors made it known after his death. It is rather 
generally believed that he did not publish because he distrusted his 
postulate, and thought his scholium defective. If so he was correct.” 
In 1971, however, and writing on this same point, Hacking says of 
Bayes that “His logic was too impeccable” [p. 347]. Stigler [1986a, 
p. 130] suggests that any reluctance Bayes might have felt towards 
publication could perhaps be attributed to difficulty in the evaluation 
of the integral of his eighth proposition. Good [1988] mentions three 
possible reasons for non-publication, viz. (i) the implicit assumption 
that a discrete uniform prior for r (the number of successes) implies 
a continuous uniform prior for p (the physical probability of a success 
at each trial), (ii) the essential equivalence of the assumptions as to 
the two priors in (i) when N (the number of trials) is large, and (iii) 
the first ball (by means of which p is determined) is essentially a red 
herring. 

3. Canton is described in Pearson [1978] as “the Royal Society Secre- 
tary” [p. 369]. However his name does not appear in Pearson’s list 
of secretaries on p. 369, nor in Signatures in the First Journal-Book 
and the Charter-Book of the Royal Soctety: nor is he listed as hold- 
ing office in the Royal Society in The Record of the Royal Society of 
London [1912]. 

4. Commenting on this letter, Savage, in an unpublished note of 1960 
(printed as the Appendix to the present work), wrote “this is appar- 
ently the first notice ever taken of asymptotic series”. On this point 
see Appendix 2.1 to Chapter 2. Deming (see Molina and Deming 
[1940, p. xvi]) states that the manuscript was submitted to the Royal 
Society by Price. | 

5. On works attributed to Bayes see Pearson [1978, p. 360-361]. 

6. For reprints and summaries of the Essay the following should be con- 
sulted: Barnard [1958] (reprinted in Pearson and Kendall [1970]: note 
also comment in Sheynin [1969]), Bru and Clero [1988], Dinges [1983], 
Edwards [1978], R.A. Fisher [1956/1959], Molina [1931], Molina and 
Deming [1940] (reviewed by Lidstone [1941]), Press [1989] and Timer- 
ding [1908] (see Pearson [1978, pp. 366, 369]). The 1918 catalogue of 
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11. 


12. 
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the Printed Books in the Edinburgh University Library lists, as num- 
ber 0*22.14/1, a work entitled “A Method of Calculating the Exact 
Probability of All Conclusions founded on Induction. By the late Rev. 
Mr. Thomas Bayes, F.R.S.” I am indebted to Mrs. Jo Currie of the 
Special Collections section of that library for the information that 
this work is in fact merely a reprint of the Essay: with it is bound the 
supplement (listed as 0*22.14/2), both being reprinted in this edition 
in 1764. 


. Unlike all Gaul. 
. Price’s nephew, William Morgan, writes: “Among these [i.e. Bayes’s 


papers] Mr. Price found an imperfect solution of one of the most 
difficult problems in the doctrine of chances ... ” [1815, p. 24]. Later 
he speaks of Price as “completing Mr. Bayes’s solution”. It seems 
clear from Price’s introductory remarks to the Essay, however, that 
the major part of the latter was presented as Bayes had left it, though 
Price did expand on the Rules given by Bayes. 


. See de Finetti [1972, p. 159). 
. See Savage [1960]. Hacking [1971] finds in Price’s introduction “per- 


haps the most powerful statement ever, of the potential relations 
between probability and induction” [p. 347]. 

Condorcet [1785b, p. lxxxiii] traces the idea to Jacques Bernoulli and 
de Moivre. 

Bernoulli’s Law of Large Numbers, in modern terms, runs as follows: 
let {A;} be a sequence of independent events with Pr [A;| = p, where 
21s a natural number. For every ¢€ > 0, 


Pr[|S, /n—-p|>«]) 7-0 asn—-oo, 


where S, = )-; Jy. (Here I, denotes the indicator function of Ag, 
i.e. that function taking on the values 1 on A, and 0 off A;.) Baker 
(1975, p. 162] does not find it surprising that the foundations for 
the inversion of Bernoulli’s Theorem were laid in England; he traces 
this to some aspects of Newtonian philosophy. Further he notes (op. 
cit., p. 166) the relationship between Bayes’s passage from a physical 
model of probability to an epistemological interpretation, and Price’s 
appendix showing clear evidence of the logic of Hume’s Treatise. For 
further details on this last point see Gillies [1987]. 

Price also comments [p. 373] on the defects of the asymptotic nature 
of de Moivre’s results. The respect in which Price held de Moivre is 
commented on by Morgan [1815, p. 39] as follows: “In the first of 
these papers [the two published in the Philosophical Transactions in 
1770] he corrected an error into which M. De Moivre had fallen; ... 
From the high opinion he entertained of the accuracy of De Moivre, 
he conceived the error to be his own rather than that of so eminent 
a mathematician, and in consequence puzzled himself so much in the 


14, 


26. 
. More correctly, one postulate in two parts. 
28. 
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correction of it, that the colour of his hair, which was naturally black, 
became changed in different parts of his head into spots of perfect 
white.” See Dale [1988b] for a discussion of the relationship between 
Bayes’s theorem and the inverse Bernoulli theorem. In the twelfth 
volume (1763-1769) of the Philosophical Transactions (Abridged) we 
find the following comment on Bayes’s problem: “In its full extent 
and perfect mathematical solution, this problem is much too long 
and intricate, to be at all materially and practically useful, and such 
as to authorize the reprinting it here; especially as the solution of a 
kindred problem in Demoivre’s Doctrine of Chances, p. 243, and the 
rules there given, may furnish a shorter way of solving this problem. 
See also the demonstration of these rules at the end of Mr. Simpson’s 
treatise on “The Nature and Laws of Chance’.” [p. 41]. The reference 
to de Moivre being to his Approzimatio ad Summam Terminorum 
Binomii (a + b)” in Seriem expansi, it appears that there was also 
some confusion here between Bayes’s result and the inverse Bernoulli 
theorem. 

The existence of this supplement to the Essay is not mentioned by 
Barnard [1958] (see Sheynin [1969, p. 40]), although note is taken of 
it in the note at the end of the reprint of his article in Pearson & 
Kendall [1970]. 


. See Sheynin [1969] for further comments and discussion. 
. Bayes’s formulation of the problem is viewed by de Finetti, a leading 


subjectivist, as unsatisfactory — see his [1972, p. 158]. 


. This note is printed as the Appendix to the present work. 
. Seven definitions and seven propositions — is their purpose analogous 


to that of Carroll’s maids and mops? 


. But see Price’s introduction to the Essay, foot of p. 372. 
. See de Moivre [1756]. 
. Savage [1960] calls it “of course most interesting”. See also Bernoulli’s 


Ars Conjectandz and de Moivre’s The Doctrine of Chances. 


. See Savage [1960]. 
. Edwards [1974, pp. 44-45]. 
. See Fine [1973, pp. 60-61], Hacking [1975, pp. 152-153] and Shafer 


[1976a]. 


. Perhaps this supports de Finetti’s [1937] view that the idea of “re- 


peated trials” is meaningless for subjective probability — see Kyburg 
and Smokler [1964, p. 102]. 
See Edwards [1978, p. 116] for references. 


Some writers, including Pearson [1920a] and R.A. Fisher [1956], have 
referred to Bayes’s table as a billiard table (which of course is not 
square). One might wonder whether such referral, occurring as it does 
in connexion with matters of chance, perhaps embodies a pun, as the 
word “hazard” was formerly used for a pocket of a billiard table. 
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32. 


33; 
. See Dale [1982] and, for a contrary assertion, Edwards [1978, p. 117]. 


30. 


36. 
37. 
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This specifies a uniform distribution in the plane: the deduction of a 
uniform distribution over the side of the table is tacit. 

Note also Edward’s [1978, p. 116] reformulation. 

“A deliberately extramathematical argument in defense of Bayes’ pos- 
tulate”, Savage [1960]. 

This seems to imply exchangeability: for example, if a coin is tossed 
three times, the scholium says that 


Pr [3 heads] = Pr[2 heads] = Pr[{1 head] = Pr [0 heads] 


(a a presumably!), and hence, for example 
Prd = Pree) Seer 


See Edwards [1974, p. 48] and Zabell [1982]. 
See Savage [1960]. 


For comment on Bayes’s evaluation of the incomplete beta-integral 
see Lidstone [1941, pp. 178-179], Molina and Deming [1940, pp. xi- 
xii], Sheynin [1969, p. 4], [1971a, p. 235], Timerding [1908, pp. 50-51] 
and Wishart [1927]. The last of these authors points out [p. 10] an 
erroneous value given by Bayes and undetected by Timerding. For 
a detailed study of the incomplete beta-function see Dutka [1981], 
while Hald [1990b] may be consulted for information about the con- 
tributions of Bayes and Price to the evaluation of the beta probability 
integral. 

The beta distribution seems to be ascribable to Bayes (see Sheynin 
[1971a, p. 235]): the beta-function, first given by Euler in 1730 (hence 
its being known also as the Eulerian integral of the first kind), was 
given this name by Binet in 1839, according to Cajori [1929, §649]. 
See Pearson [1978, p. 369]. 

Notice that this is also framed in terms of ratios of causes — see page 
406 of the Essay. 

Note the comment by Waring in Todhunter {1865, art. 839]. See also 
Savage [1960], and Pearson [1978, pp. 365-366]. It seems that Price, 
and not Bayes, was perhaps the first to frame a sort of “rule of succes- 
sion” argument. See Keynes [1921/1973, chap. XXX] for commentary 
on this rule. 

Dinges [1983, p. 95] is one of the few authors to acknowledge this 
problem as being posed by Bayes. The mentioning of events occurring 
under the same circumstances as they have in the past can perhaps 
be traced back to G. Cardano (1501-1576), in whose Liber de Ludo 
Ale@w, caput VI, we read “Est autem, omnium in Alea principalis- 
simum, aequalitas, ut pote collusoris, astantium, pecuniarum, loci, 
fritilli, Aleae ipsius.” There is some discussion of Cardano’s mathe- 
matical work in Cajori (1893, pp. 134-136]. 


Notes: chapter 4 541 


Chapter 4 


Ly; 


10. 


In his review of Molina and Deming [1940], Lidstone [1941] says that 
Todhunter’s criticism is “rather harsh, and it is in any case based on a 
particularly high standard of comparison” [p. 179]. He also notes that 
“De Morgan, who was no bad judge, was much more appreciative” 
(loc. cit.). 


. The Doctrine of Chances {1756, pp. 1-3]. 
. The definitions of independence provided by Price and de Moivre are 


discussed in §5.3.4. 


. Private communication of February 1992. 
. Savage [1960] remarks that the derivations of the propositions 


<4 


are beclouded by the idea that numbers are a little more shameful 
than ratios”. 


. See Shafer [1978, p. 345]. Shafer also points out (loc. cit.) that de 


Moivre was apparently the first to give a statement of a rule of addi- 


tivity for probabilities, viz. Pr[E]+ Pr[ #] = 1. 


. In his discussion of inverse probability, Wrighton [1973, p. 36] de- 


clares that in those circumstances in which it is legitimate to identify 
relative frequencies with probabilities, the temporal order involved in 
the generation of a random event may safely be ignored. 


. Hartigan [1983, p. 6] has drawn attention to the fact that Bayes’s 


definition “describes how a person ought to bet, not how he does 
bet.” See also Stigler [1982a, p. 250]. For a detailed discussion of 
whether Bayes’s concept of probability places him in the subjective 
or objective school see Dinges (1983, §6]. Note also Shafer [1985]. 


. The changing of a (subjective) probability from Pr[-] to Pr[-|£] 


upon the discovery that the event F obtains (or, under certain in- 
terpretations, that the proposition FE is true), is called the process 
of conditionalization by Richard Jeffrey — see his [1983, pp. 165, 
171-2]. Jeffrey’s Rule states that a probability P is altered to a new 
probability P, based on a partition {#;}}? by setting 


PLP] = > PPB) i LB), 


aa] 


where {£;} is a partition for which P, [-|F;] = P[-|,] for all 7. The 
mathematical properties of this rule have been discussed by Diaconis 
and Zabell [1982]. 

According to Savage [1960], this section of the Essay contains but a 
germ of a theorem about the probability of causes (so often wrongly 
attributed to Bayes). The present discussion owes much to Edwards 
[1978]. For further comment see Dinges (1983, §4]: the latter asserts 
that Bayes shines in this section “als ein Experte der damaligen In- 
tegralrechnung” [p. 80]. 
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These are phrased respectively by Edwards [1978, p. 117] as “the 
event ‘that the first ball thrown lies in a particular interval on the 
table’” and “the event ‘that the probability at each of the subsequent 
trials lies in a particular interval’.” 

Edwards [1978, p. 117] states “If, therefore, all Bayes’ propositions 
are interpreted in terms of the event ‘that the probability lies in a 
particular interval’ rather than ‘that the ball lies in a particular in- 
terval’, the first postulate, of a uniform table, is redundant.” 

One might well see, in the introduction of dF(-) here, something 
analogous to the method of arbitrary functions (fonctions arbitraires) 
used by Poincaré [1896], Hostinsky [1920], [1926] and [1931], and Hopf 
[1936]. A recent discussion by von Plato [1983] traces the introduc- 
tion of this method into probability theory to von Kries [1886]. 

The well-known birthday problem (see Feller [1968, p. 33]), at- 
tributed by Ball and Coxeter [1974, p. 45] to Harold Davenport, is 
usually stated under the assumption that birthdays are uniformly 
distributed throughout the year. Bloom [1973], in solving a problem 
posed by Knight, shows that the solution 1 — (365), /365” obtained 
under (1) independence and (2) uniform distribution of birthdays, is 
a lower bound attained only when all days are equally probable. 

The birthday problem has received much attention over the years: 
for further details and generalizations Abramson and Moser [1970], 
Gehan [1968], Holst [1986], Joag-Dev and Proschan [1992], M°Kinney 
[1966], Munford [1977], Naus [1968], Nunnikhoven [1992], Sandell 
[1991] and Schwarz [1988] may be consulted. 

Bayes’s use of EF for fa is a happy one: according to Cajori [1929, 
§439], the symbol was introduced in 1827 by Andreas von Ettings- 
hausen. 

Molina [1930, p. 383] points out that this important fact was omitted 
by Todhunter: “Failure to appreciate this point kills the significance 
of Bayes’ scholium”. (Todhunter also failed to discuss the scholium.) 
On the equivalence stated in this proposition see my preceding re- 
marks and Edwards [1978, p. 117]. 

See Maistrov [1974, p. 92]. 

I have throughout interpreted integrals as areas and vice versa. 

The argument advanced by Bayes here has been well summarized by 
Edwards [1978, p. 111]. For further discussion of the Scholium see 
Dinges [1983, §8] and Gillies [1987, §5}. 

See de Finetti [1932] and Feller [1966, p. 224]. 

Hardy’s result runs as follows: let y, and xy, be functions of bounded 
variation, vanishing at the origin and with normal discontinuities. If 


1 1 
/ x” dx, =a 2" dx, , 
0 0 


21. 
22. 


23. 


29. 
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then x, = X, for all x. (The discontinuities of a function y are said 
to be normal if 

x(x) = [x(@ — 0) + x(@ + 0)]/2 
for z € (0, 1).) 
This result is concisely given in (ii) in Hardy [1949/1991, p. 261]. 
See Hardy [1949/1991, pp. 261-262] for a discussion of examples in- 
volving other choices of moment sequences {pip}. | 
Some knowledge of the negative binomial distribution was certainly 
available in Bayes’s time. In his analysis of the correspondence be- 
tween Pascal and Fermat in the 1650’s, Hald [1990a, p. 61] suggests 
that while the former used the binomial distribution with p = 1/2 
to solve the problem of points, the latter used the negative binomial 
distribution in the same question. The problem of points was also 
studied by Montmort [1713] and de Moivre [1756] — see Hald, op. 
cit., for a discussion of their contributions. 


. Stigler [1982a, p. 250] cites Karl Pearson, R.A. Fisher, Ian Hacking 


and Harold Jeffreys as misinterpreters of the argument. 


. Bayes of course does not qualify the noun. 

. Compare this with the discussion in §3.6. 

. Notation altered. See also Geisser (1985, pp. 203-205]. 

. Stigler [1982a, p. 253] finds it “tempting to speculate that it was 


Reverend Thomas Bayes’s experience as a minister that made this 
approach more congenial than his original postulate of an a priori 
uniform distribution for 6 [my 2x]: All men may know the works of 
God, and through these works know God, but only men of great faith 
know God directly.” 

“Unlike the marvellously flexible principle of insufficient reason, which 
is immediately (if dubiously) adaptable to any parametric model” 
[Stigler 1982a, p. 253]. 

On this point it is worth noting Edgeworth’s comments: 


where we are concerned only with a small tract of values 
it will often happen that both the square and the square 
root and any ordinary function of a quantity which assumes 
equivalent values with equal probability will each present 
an approximately equal distribution of probabilities, 
[1911, §8] 


and again 


when the magnitude for whose various values we claim 
equal probability is very large in comparison with the tract 
through which it varies, then it comes to much the same 
whether the equi-probability is claimed for the magnitude 
itself or for some (ordinary) function thereof — the square, 
or square root, or reciprocal, etc. [1922, p. 263] 
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32. 
. For good discussions of the rule of succession see Keynes [1921/1978, 
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Mention has already been made of R.A. Fisher’s arcsin example 
(described before as “shabby”, but perhaps the Dickensian “shabby 
genteel” would be more exact). O’Hagan [1994, §§ 3.29-3.34] has dis- 
cussed the réle of improper priors in the expression of prior ignorance. 
He notes that difficulties can arise in the consideration of y = g(@) 
if g is not a one-to-one transformation when 6 is discrete, with sim- 
ilar problems occurring if g is one-to-one and nonlinear when @ is 
continuous. For example @ and » = @? cannot both have uniform 
distributions on [0,1]. Moreover, the use of an improper prior may 
result in a posterior that is also improper. 

In a footnote on p. 405 Price writes “There can, I suppose, be no 
reason for observing that on this subject unity is always made to 
stand for certainty, and s for an even chance.” 


See Dale [1982]. 


chap. XXX] and Zabell [1989b]. Gillies [1987, §3] distinguishes be- 


tween “Price’s rule of succession”, viz. 


b 1 
; a? ia/ | eo” dz = b" tt gmt! | 
a 0 


and “Laplace’s rule of succession”, viz. 


[ootae/ [ arde= (nt nina). 


By the unappellational, or uneponymised, term we shall always mean 
the latter of these rules. Herschel, in his 1850 review of Quetelet on 
probabilities, showed a clear understanding of the two rules. He wrote 
“the expectation that the sun will rise tomorrow, grounded on the sole 
observation of the fact of its having risen a million times in unbroken 
succession, has a million to one in its favour. But to estimate the 
probability, drawn from that observation, of the existence of an influ- 
ential cause for the phenomenon of a daily sunrise, we have to raise 
the number 2 to the millionth power ... and the ratio of this enor- 
mous number to unity, is that of the probability of the phenomenon 
having happened by cause, to that of its having happened by chance” 
[Herschel 1857, p. 415]. 

Price was well acquainted with Condorcet: see Pearson [1978, p. 375]. 
For further comment on the solution see Lidstone [1941, p. 179] and 
Pearson [1978, pp. 368-369]. Pearson (loc. cit.) finds neither Tod- 
hunter nor Timerding illuminating on this point. Note also the dis- 
cussions in Gillies [1987, pp. 332-333] and Zabell [1988a], [1989b]. 
Dinges [1983, pp. 67-68] points out that, while Price’s intervals should 
not be interpreted as confidence intervals, they could perhaps be con- 
sidered in a fiducial context. See Barnard [1987] for a justification of 


Notes: chapter 5 545 


R.A. Fisher’s claim that the term “probability” is used in the same 
sense both in the fiducial argument and in Bayes’s Essay. 

37. An example described by Lidstone [1941, p. 179] as “now notorious”. 

38. See, for example, §XI, “Of the probability of chances”, of his Trea- 
tise of Human Nature, where we find the words “One wou’d appear 
ridiculous, who wou’d say, that ’tis only probable the sun will rise to- 
morrow, or that all men must dye; tho’ ’tis plain we have no further 
assurance of these facts, than what experience affords us.” Further, 
in his Essays Literary, Moral and Political, we find in §4, “Sceptical 
doubts concerning the operations of the understanding” of the /n- 
quiry concerning Human Understanding, the sentence “That the sun 
will not rise to morrow, is no less intelligible a proposition, and im- 
plies no more contradiction, than the affirmation, that it well rise.” 
And as a footnote to §6, “Of probability”, in the same essay, we have 
“ .. we must say, that it is only probable all men must die, or that 
the sun will rise tomorrow.” 

39. No doubt our observer is “an agéd, agéd man” by now! 

40. For further discussion of this topic see Zabell [1989b]. 

41. Lidstone [1941, p. 179] emphasizes Price’s recognition of the distinc- 
tion between casual and causal. 

42. For a discussion of (part of) the Essay from a decision-theoretic view- 
point see Dinges [1983, §2]. While noting that Bayes was not decision 
orientated, Ferguson [1976, p. 338] states that in the Essay “Not 
even the probability of the occurrence of the event on the next trial 
is calculated”, though such a calculation is given in the Appendix. 
He ascribes the first such calculation to Laplace in 1774. 


Chapter 5 


1. The spelling is from his own hand, rather than as given by Todhunter 
[1865]. His birth-date is given by Lausch [1993] as 12 Ellul 5489 or 
5488 (6th September 1729 or 17th August 1728); the date of his death 
is incorrectly given as 1796 by Lancaster [1968]. 

2. Keynes [1921/1973], Pearson [1978] and Todhunter [1865] all give the 
incorrect date 1771 (the first edition was printed in 1761). 

3. Cajori [1893/1991, p. 53] writes “The Greeks had the name epimorion 
for the ratio -+.”. However Liddell and Scott [1968] give the defini- 


; n+l 
tion 


éxtpopt-ao pds formation of a number of the form 1+ + 
-o¢ containing a whole + a function with 1 for the 
numerator (1+ +) , 


which seems to be the reciprocal of Cajori’s epimorion. 
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. According to Lausch [1993, p. 9], Mendelssohn’s paper Gedanken von 


der Wahrschetnlichkett was read by proxy in September 1756, he 
being a stutterer. 


. See also Walley [1991, §5.3.4]. 
. One must bear in mind, however, that Mendelssohn was regarded as 


one of the best mathematicians in Berlin at that time — see Lausch 
[1993, p. 19]. Mendelssohn’s mathematical work is discussed in Lausch 
[1990]. 


. For details see Sheynin [1971c]. 
. There is no complete edition of Lambert’s works (see Sheynin [1971c, 


p. 244]), and his publications are in fact not easily accessible. 


. This extract is from Bernoulli’s Meditationes: the longer passage from 


which it is taken is reprinted in Bernoulli [1975, pp. 42-48], and this 
passage is dated there as Winter 1685/86 by B.L. van der Werden. 
Confusion between Pr[cause|event] and Pr[event|cause] is by no means 
uncommon: even de Moivre was apparently unclear at times on the 
distinction, for he wrote 


Further, the same Arguments which explode the Notion of 
Luck, may, on the other side, be useful in some Cases to 
establish a due comparison between Chance and Design: 
We may imagine Chance and Design to be, as it were, in 
Competition with each other, for the production of some 
sorts of Events, and may calculate what Probability there 
is, that those Events should be rather owing to one than 
to the other. To give a familiar Instance of this, Let us 
suppose that two Packs of Piquet-Cards being sent for, it 
should be perceived that there is, from Top to Bottom, the 
same Disposition of the Cards in both Packs; let us likewise 
suppose that, some doubt arising about this Disposition of 
the Cards, it should be questioned whether it ought to be 
attributed to Chance, or to the Maker’s Design: In this Case 
the Doctrine of Combinations decides the Question; since 
it may be proved by its Rules, that there are the Odds 
of above 263130830000 Millions of Millions of Millions of 
Millions to One, that the Cards were designedly set in the 
Order in which they were found. [1756, p. v] 


The point has been carefully described by Keynes, who, in a discus- 
sion of credibility, wrote 


The manner in which the resultant probability is affected 
depends upon the precise meaning we attach to “degree 
of reliability” or “coefficient of credibility.” If a witness’s 
credibility is represented by +, do we mean that, if a is the 
true answer, the probability of his giving it is x, or do we 


Lh, 
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mean that if he answers a the probability of a’s being true 
is z? These two things are not equivalent. 
[1921, chap. XVI, §18] 


That a similar confusion exists in diagnostic screening tests in con- 
nexion with the definition of “false-positive rate” has been pointed 
out by Zabell [1988b, p. 332]. See also Lindley [1997, p. 149] and the 
reply by Berger et al. [1997, p. 158] 

Gigerenzer [1994, p. 134] notes that Bernoulli’s Ars Conjectand: has 
been seen at various times as providing support for each of what 
we now consider to be the subjective, the logical and the frequency 
interpretations of probability. 

Hailperin is not convinced by Shafer’s arguments of 1978 for the in- 
terpretation of Bernoulli’s probability as non-additive: see his [1996, 
pp. 59-60]. He also notes (op cit., p. 66) that it would be difficult 
to reconcile the advocation (or adoption) of non-additive probabili- 
ties with the desire to obtain probabilities from frequencies (say, by 
Bernoulli’s law of large numbers). 

Hailperin [1996, p. 73] finds the basis for Lambert’s assertion that 
(in essence) Pr[A] + Pr[ A] < 1 to be faulty, and hence he gives no 
support to claims of evidence of the use of non-addditive probability. 
The example of the syllogism Barbara considered by Lambert is of 
the following form: 


all A are B 
Cis A 
therefore C' is B. 


This is then modified to each of the following in turn: 


$ A are B all A are B 
Cis A C is Z A 
therefore C 2 is B, therefore C 5 is B, 


other examples in which both premises are qualified also being given. 
For further details see Shafer [1978, pp. 357-358]. 

For further discussion see Molina and Deming [1940, p. xv]: this paper 
is reviewed by Lidstone [1941]. A general discussion of Bayes’s work 
on infinite series may be found in Dale [1991]. os 
See Chapter 2, Note 35. 7 

Deming in fact gives as a fourth reason the fact that in his Approzi- 
matio ad Summam Terminorum Binomi (a+ 6)" in Seriem expansi 
of 1733 de Moivre mentions his work on the series 


1 1 1 1 
BS gp A ace a eat 
13 + 360 ~ t260 + 1680 7 &> 


a series whose connexion with Bayes’s series, with z = 1, is clear. I 
must admit to not finding this reason very compelling. 
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For a form in which Bernoulli numbers are used see Archibald [1926, 
p. 675]. 

See Dale [1991] and Molina and Deming [1940, p. xvi]. 

One must concur with Todhunter [1865, art. 553] that “these inves- 
tigations are very laborious, especially Price’s.” 

Perhaps even Price found these calculations wearisome! 

See Sheynin [1969]. 

‘The second term in the summand in the numerator below should have 
a coefficient “2”. 

This value is arrived at by finding the second derivative of the beta 
density (or the “Bayessche Kurve”, as Timerding [1908, p. 51] calls 
it) ka?(1—2x)!. 

For further details of this approximation see Sheynin [1969, §5] and 
Timerding [1908, pp. 58-59]. 

Shelton published two shorthand books: Tachygraphy [1641] and Ze:- 
glographia [1654]. It is the latter of these that is most closely related 
to Coles’s work of 1674. 

This notation is of course not used by Bayes. Indeed, there 1s some 
confusion as to what is meant by the term “incomplete beta-function” , 
Dutka [1981] and Jordan [1965] using it, as we have done, for B,(a, b), 
while Beyer [1968] uses it for J,(a,b) = B,(a,6)/B(a, b). Dutka (op. 
cit.) refers to I,(a,b) as the “incomplete beta-function ratio”. 
References in this section are to the second edition of 1768. The Four 
Dissertations was reviewed in The Monthly Review 36 (1767), pp. 51- 
66 & 80-93. 

For his nephew’s views on Price’s use of Bayes’s results see Morgan 
[1815]. 

For comment on Price’s criticism of Hume’s views see Gillies [1987] 
and Sobel [1987, p. 169]. The relationship between miracles and statis- 
tics is explored in Kruskal [1988]. See also Zabell [1988a], [1988b]. 
This result seems to suggest that 


Pr[AN B] = Pr[AN B] = Pr[A] 


when B has no influence upon A’s occurrence. A modern interpreta- 
tion would replace the joint probabilities with conditional ones. The 
same sort of thing occurs in Laplace’s Essai philosophique sur les 
probabilités — see Note 8, p. 181, in the translation by Dale [1995]. 

According to Whittaker [1951], in the century between Isaac New- 
ton’s death and George Green’s scientific activity, “the only natural 
philosopher of distinction who lived and taught at Cambridge was 
Michell” [p. 153]. A similar sentiment is expressed in the ninth edi- 
tion of the Encyclopedia Britannica, where Michell is described as an 
“eminent English man of science”; and M°Cormmach entertains the 
view that Michell was “the most inventive of the eighteenth-century 
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natural philosophers” [{1968, p. 127]. This view is endorsed by Tay- 
lor [1966], who describes Michell as having “a universal curiosity” 
[p. 71] and as being “an excellent example of the ‘universal’ man of 
mathematical practice, who can be found in the eighteenth century” 
ip. 213). 

This paper is considered by Hardin [1966, p. 36] as the most important 
of Michell’s astronomical works. 

Arbuthnott’s name is spelled variously with one or two t’s. 

This essay and the discussion it engendered are discussed in Sheynin 
[1973, §5], where we read 


In all, ARBUTHNOT’S is a confused and superficial piece 
of writing, the main merit of which is its clear statement 
substantiated by a table of births revealing the actually 
observed predominance of male births in London. [p. 303] 


Michell gives the ratio (60’/6875.5’)*, the denominator in fact being 
2 radians. 
Commenting on this problem, MSCormmach writes 


I know of no true probabilistic analysis before Michell’s. 
Surely one reason for why this sort of reasoning was not 
common is that probability, or chance, was considered the 
antithesis of law or design; and it was precisely in the heav- 
ens where eighteenth-century philosophers found their most 
persuasive evidence of order. [1968, p. 140] 


A modern calculation replaces Michell’s figure of “somewhat more 
than 496000” by 476189. 
The figures are very little changed if sin 3h is used instead of 3h" 
As noted by Todhunter [1865, art. 622], the numerical results quoted 
by Herschel as being Struve’s do not agree with the latter’s work. The 
results are however correctly given in the 1873 edition of Outhnes of 
Astronomy. 
According to the Dictionary of National Biography, Forbes, whose 
mother had been “the first love of Sir Walter Scott”, was elected 
F.R.S. “at the unprecedented early age of nineteen”. 
Forbes (op. cit.) points out that this value is approximately n?/2p. 
See Harrington [1988] for a popular discussion of optical double and 
visual binary stars. 
Lancaster [1994] finds the mathematical difficulty in Michell’s work 
to lie not in the requiring that the Seven Sisters lie within a specified 
circular area of given apparent diameter, but rather in the requiring 
that they fall within any such area of the same diameter somewhere 
in the heavens. 

A general exposition of Michell’s work is given in Hardin [1966]. 
Writing of Michell’s earthquake theory Hardin says 
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Although Michell’s thesis is mistaken, his work contains a 
considerable number of important insights into a variety of 
geological phenomena [p. 30]. 


Something similar could surely be said of the paper we are examining 
here. 

Bertrand’s solution of this problem is discussed in Sheynin [1994]. 
Bearing in mind the amount of commentary on Michell’s work that we 
have discussed here, the reader cannot but agree with Hardin [1966] 
who has made a special point of commenting on 


the great expenditure of time and effort which have been 
required up through the present day to achieve only partial 
clarification of these logical questions. [p. 40] 


For biographical details of Beguelin see Netto [1908, p. 227], and for 
a general discussion of his work on probability see Todhunter [1865, 
arts 603-616]. 

Sur les suites ou séquences dans la lotterie de Genes, Histoire de 
l’Académie ... Berlin, 1765 (published 1767). 

The year of publication is uncertain. 

The problems are numbered I to XI, but there is no Problem IX. 
See Pearson [1978, pp. 599, 601]. For a detailed discussion of Pearson’s 
correspondence with R.A. Fisher on this problem see Inman [1994]. 
All references are to Serret’s edition of the Guvres de Lagrange. 
Pearson [1978, pp. 598-599]. 

“Lagrange ... gave such a cloudy discussion of a problem in inverse 
probability that it is doubtful whether he had read Bayes” [Stigler 
1975, p. 505]. Stigler [1986a, p. 118] concludes also that “I think it is 
fair to say that his work was untouched by any real sense of inverse 
probability.” 

Pearson [1900]. The (P, x”) problem featured in a comparatively mild 
dispute between Pearson and R.A. Fisher in an exchange of letters 
printed in Nature in 1935. In his first letter of the 24th August Pear- 
son wrote 


I introduced the P, yx? test to enable a scientific worker 
to ascertain whether a curve by which he was graduating 
observations was a reasonable ‘fit’. [p. 296] 


Here ‘graduating’ means the fitting of a mathematical model to ob- 
served data. For further details of the controversy see Inman [1994]. 
On the naming of this distribution see Patel and Read [1982]. 
Pearson [1978, p. 156]. 

Quoted here from Pearson [1978, pp. 600-601). 

Pearson [1978, p. 601]. 
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Mis-spelt in the bibliography in Keynes [1921/1973]. Hacking [1971] 
finds Emerson “a curious example” of “the lesser minds of the period 
that take the conceptual matters seriously instead of ploughing on 
with the mathematics” [p. 350]. Cajori [1919b] writes of Emerson, a 
self-taught mathematician, that “he wrote many mathematical texts 
which indicate a good grasp of existing knowledge, but not great 
originality” [p. 192]. Taylor combines these opininons in saying 


at Darlington there was William Emerson, an eccentric of 
small private means who devoted himself to turning out 
simple textbooks on all branches of pure and applied math- 
ematics, designed particularly for the self-educated man — 
the ‘mechanic’ as the author put it. Emerson produced 
a couple of dozen of such books, and there is contempo- 
rary evidence that they were widely used, and their author 
greatly esteemed. Yet he felt himself neglected. 

(1966, pp. 34-35.] 


This refusal is given in Emerson [1793] as follows: “It was a d—n’d 
hard thing that a man should burn so many farthing candles as he 
had done, and then have to pay so much a year for the honor of 
F.R.S. after his name. D—n them and their F.R.S. too.” Taylor puts 
the matter somewhat euphemistically: 


A teacher of mathematics, he [i.e. Emerson] said, received 
no encouragement, and if, indeed, he were offered the re- 
ward of a Fellowship of the Royal Society, he found himself 
out of pocket by the quarterly subscription. [1966, p. 35] 


Emerson was not the only one to refuse an honour because of the 
cost: the refusal of a “Distinguished Officer” to accept an honorary 
degree from Oxford in the 19th century was commemorated in the 
following epigram by H.L. Mansell: 


Oxford, no doubt you wish me well, 
But, prithee let me be: 

I can’t alas! be D.C.L. 
Because of L.S.D. 


(see Booth [1865, p. 315].) 

Compare the axioms and definitions given by Bayes and de Moivre. 
Dinges [1983, p. 88] finds evidence of both the aleatory and the epis- 
temic notions of probability in Emerson’s work; Hacking [1971] sees 
merely a “groping for the idea of probability as ‘judgement’ or cred- 
ibility.” [p. 351]. 

Noting Buffon’s gifted amateurism in mathematics, Coolidge writes 
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There certainly never was a man belonging to that class 
which I have called amateur mathematicians who had a 
wider interest in all science, especially descriptive science, 
than George-Louis Leclerc, Comte de Buffon. [1990, p. 171] 


Shortly before he died Buffon became blind, an event that occa- 
sioned the following epigram: 


Ah! s’il est vrai que Buffon perd les yeux, 
Que le jour se refuse au foyer des lumieres: 
La Nature a la fin punit les curieux, 
Qui pénétroient tous ses mysteres. 


which Mrs Piozzi translated as follows: 


Buffon’s bright eyes at length grow dim, 
Dame Nature now no more will yield, 
Or longer lend her light to him 
Who all her mysteries revealed. 


(see Booth [1865, p. 206].) 

Part of the text of this essay is Buffon’s Mémozre sur le jeu de franc- 
carreau of 1733, an early attempt at geometrical probability (see 
Roger [1978, p. 29]). For a review of three books on Buffon see Sloan 
[1994]. 

This Coolidge describes as “obviously a very foolish question” [1990, 
p. 172]. 

It is clear from other passages in this memoir that Buffon did not 
regard probability as normed. For further discussion of his work see 
Coolidge [1990, chap. XIII] and Zabell [1988a]. 

In the original, p. 64 is followed immediately by p. 85. 

A rough translation runs as follows: Distinguished mathematicians 
have studied this matter, especially the famous Laplace in the Notes 
of the Paris Academy. Since however in the solving of problems of 
this type advanced and hard analysis may have been applied, I have 
considered it worth the effort to address the same questions by an 
elementary method and appropriate use of a knowledge of series. By 
that theory this changed part of the probability calculus might be 
reduced to the theory of combinations, as I first derived in a disserta- 
tion transmitted to the Royal Society. I shall undertake to touch upon 
these questions briefly here, by a lucid, especially rigorous method. 
Todhunter [1865, arts 766, 767 & 774]. For a similar opinion see Can- 
tor [1908, p. 243]. 

In his papers Trembley abbreviates his first name to “Io.” 
Trembley’s word is “schedulas”: our translation is more convenient 
than the literal “small strips of papyrus” or “small leaves of paper”. 
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Todhunter [1865, art. 773] is slightly inaccurate in noting that “Trem- 
bley remarks that problems in Probability consist of two parts”: what 
Trembley in fact wrote was “E supradictis sequitur Probabilitatem 
causarum ab effectibus oriundam, methodam requirere quae duabus 
constat partibus.” 

A particular case was later considered by Terrot (see §8.18): Zabell 
[1989b] has noted that the finite rule of succession was later indepen- 
dently given by Ostrogradskii [1846]. 

‘Todhunter’s reference here is in fact to Condorcet rather than Laplace. 
Further details may be found in Todhunter (1865, art. 851]. For a 
discussion of Prevost’s work on testimony see Zabell [1988b}. 

Notice a curious inversion of the “editorial we” in the last clause of 
this quotation. 

Indeed, the fourth volume of Gauss’s Werke, which contains his pa- 
pers on geometry and probability, carries papers on the correction of 
errors only. For a general discussion of Gauss’s contributions to statis- 
tics see Sprott [1978]. R.A. Fisher [1970, pp. 21-22] may be consulted 
for an opinion on Gauss’s appreciation of the method of maximum 
likelihood. 

According to Buhler’s biography of 1981, Gauss’s christian names 
were Johann Friedrich Carl. 

This passage is translated by Davis [1857, p. 255] as follows: “If, any 
hypothesis H being made, the probability of any determinate event E 
is h, and if, another hypothesis H’ being made excluding the former 
and equally probable in itself, the probability of the same event is 
h': then I say, when the event & has actually occurred, that the 
probability that H was the true hypothesis, is to the probability that 
H’ was the true hypothesis, as h to h’.” Le Cam [1986] states that 
“The ‘proof’ of Bayes formula by Gauss cannot even be considered 
adequate by the standards of his time or earlier ones” [p. 79]. 

For a discussion of the reasons for the qualification pp > v see p. 254 
of Davis’s translation of the Theorza Motus Corporum Coelestium. 
Sprott [1978] sees here a special case of Bayes’s Theorem, a result 
that he describes as “merely an expression of the addition and multi- 
plication rules of probability” [p. 190]. He notes further [p. 191] that 
Gauss’s interpretation of his results was given in a frequency rather 
than a Bayesian sense. 

Morgan’s mother Sarah, Price’s sister, married William Morgan, a 
surgeon in Bridgend, Glamorganshire. For biographies of Price (1723- 
1791) see Holland [1968], Laboucheix [1970] and Thomas [1924]. 

For details of the Morgans see Pearson [1978, pp. 395-396, 408]. 
This reference of Morgan’s is mysterious. From correspondence with 
the Library of Congress (Science and Technology Division) and the 
American Philosophical Society I learn that (i) the Society was not 
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in existence between 1745 or 46 and 1767-68, and (ii) the minutes or 
records of an earlier society, with which Franklin was associated and 
which preceded the American Philosophical Society, for 1762-66 have 
been lost. It can only be assumed, from lack of corroborative detail, 
that Morgan erred here. For (similar) remarks on Morgan’s accuracy 
see Holland [1968, pp. 45-46]. 

The date is mistakenly given as 1806 in Laurent [1873]. Crepel [1988a] 
says of this work “Lacroix n’est que partiellement disciple de Con- 
dorcet: son objectif, beaucoup plus pédagogique, est différent, il s’agit 
de rendre accessible & un public suffisamment nombreux non seule- 
ment certaines idées de Condorcet, mais surtout la ‘Théorie Ana- 
lytique des Probabilités’ de Laplace. La traité de Lacroix, qui con- 
stituera le manuel de référence en francais jusqu’aux trois-quarts du 
19e siécle, va en fait gommer les aspects qui nous semblent aujourd’hui 
les plus novateurs dans l’oeuvre de Condorcet” [§7(c)]. 

For a discussion of Lacroix’s work on testimony see Daston [1988, 
pp. 339-492]. 

The reference is probably to Laplace’s Mémoire sur les probabilités. 

See Todhunter [1865, art. 1057]. 


Chapter 6 


l. 


For a general discussion of Condorcet’s work see Gouraud [1848, 
pp. 89-104], Maistrov [1974], Pearson [1978] and Todhunter [1865]. 
For a brief discussion of his work on probability see Baker [1975, 


p. 81]. 


. Gillispie [1972, p. 15], writing of the memoir, says “It will hardly be 


worth while to follow him in these writings obscurely expounding the 
reasonings and procedures of probability itself in relation to causality 
and epistemology.” Recent work by Crepel, however, has been devoted 
to denying, if not indeed refuting, the existence of such obfuscation 
(see in particular his [1988a], [1988b], [1989a] and [1989b], and Bru 
and Crepel [1989]). 


. See Laplace’s Mémoire sur les probabilités, and also Todhunter (1865, 


art. 773] and Trembley [1795-1798]. 


. Condorcet’s rebarbative notation has been altered and some obvious 


misprints have been corrected. 


. Translated by Pearson [1978, p. 456] as “between two contingent 


events becoming actual”. 


. Various other “multiple Bayes’s integrals” are given, but this illus- 


tration is sufficient. 


. The evaluation of the second of these integrals may be carried out as 


in our preceding discussion: that of the first will be found in Appendix 
6.2. 
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. The reference is to Laplace’s Sur les approzimations des formules qui 


sont fonctions de trés grands nombres of 1782. 
See Pearson [1978, p. 457]. 


_ For further comments on Condorcet’s failure to mention Bayes see 


Stigler [1975, p. 505]. Note also Pearson [1978, p. 181]. 


. The persistence of this habit to this day is remarked on by Neveu 


[1965, p. ix]. 


. Todhunter [1865, art. 734] has “the next p+q trials”, as does Pearson 


[1978, p. 458]: the adjective is not present in the original, though this 
was probably the intent. 

The expression in (2) may be viewed as a predictive distribution, as it 
expresses the probability of some future sample (p,q) given observed 
data (m,n). The dependence of this distribution upon the prior (here 
assumed to be uniform) was not seen by statisticians as important: 
thus, in considering a problem whose solution required what is essen- 
tially an inverse to Laplace’s theorem, Bowley wrote 


This example then illustrates a theorem that we may give 
as obvious: that, except in the neighbourhood of the central 
value, it is indifferent what distribution of a priort proba- 
bilities of p we suppose. Over the small, important central 
region the assumption that the a priori probability of p 
over a region is proportional to that region is likely to be a 
good first approximation, whatever the actual law. 

(1926, p. 414] 


In 1933 Watanabe showed that, under certain mild assumptions, the 
predictive density is, in the limit, the binomial 


(P29) 


where m/(m+n) — 2p as m+n — oo. For remarks on the inde- 
pendence, in the limit, of the posterior distribution on the prior the 
reader is referred to Nikulin [1992]. 

For further details see the preface to Pearson [1978]. 

See Dale [1982]. 

The formula given here is as it appears in Condorcet’s memoir (with 
a slight change in notation). However the answer 3/5 given by Con- 
dorcet for the case in which m = 2,p = 1,n = q = 0, is not obtainable 
from the formula as printed, but is obtainable from a formula having 
the terms s/*t? /t and s™/t replaced respectively by (s;/t)"+? and 
(s;/t)™. This latter is in fact the form given in Todhunter [1865, art. 
734]. 

For some comments on Condorcet’s work on testimony see Zabell 
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For further examination of this formula see Owen [1987, §3], Sobel 
[1987, §2] (where the Bayes-Laplace rule is recast as “The Hume- 
Condorcet Rule for the Evidence of Testimony” ) and Todhunter [1865, 
art. 735]. 

This sixth part is discussed in some detail in Todhunter [1865, arts 
737-751]. 

For an opinion in turn on Gouraud’s exuberance see Todhunter [1865, 
art. 753]. The awkwardness of Condorcet’s expression seems to have 
manifested itself early in his career. According to Baker [1975, p. 6] 
the first paper submitted by Condorcet to the French Academy of 
Sciences was rejected by Clairaut and Fontaine, who had been charged 
with its examination, on account of “its sloppiness and its lack of 
clarity” . 

See Todhunter [1865, art. 467]. 

Hacking [1971, p. 351] considers no phrase in our subject “less felici- 
tous” than Condorcet’s probabithté moyenne. 

Similarly harsh sentiments have been expressed by Bertrand, who, 
in commenting on Condorcet’s Essaz, wrote “Aucun de ses principes 
n’est acceptable, aucune de ses conclusions n’approche de la vérité” 
[1972, p. 319]. Gillispie [1972, p. 12], on the other hand, describes 
Todhunter’s judgement as “harsh”, and he provides some comments 
by Condorcet’s contemporaries as evidence of the esteem in which he 
was held. 

See Hacking [1971, p. 351]. 

Hacking [1971, p. 351] considers Condorcet as the first to render ex- 
plicit the “groping for the idea of probability as ‘judgement’ or credi- 
bility.” For comments on the distinction between “logical” and “phys- 
ical” probabilities to be found in the works of D’Alembert, Condorcet 
and Laplace, see Baker [1975, pp. 177-178]. 

Condorcet describes the third part of this work as an “Ouvrage plein 
de génie & l’un de ceux qui sont le plus regretter que ce grand homme 
ait commencé si tard sa carriere mathématique, & que la mort |’ait 
si-t6t interrompue” [p. vilj]. 

See Gillispie [1972, p. 15]. 

Writing of the use of Bayes’s Theorem in the probability of judg- 
ments, Poisson [1837, p. 2] says “il est juste de dire que c’est a Con- 
dorcet qu’est due ]’idée ingénieuse de faire dépendre la solution, du 
principe de Blayes [sic], en considérant successivement la culpabilité 
et innocence de |’accusé, comme une cause inconnue du jugement 
prononcé, qui est alors le fait observé, duquel il s’agit de déduire la 
probabilité de cette cause.” 

Perhaps one sees here an adumbration of the Principle of Irrelevant 
Alternatives. 

For a summary of the Essai see Cantor (1908, pp. 253-257]. 
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As precursors in the search for a method for the determination of the 
probability of future events from the law of past events Condorcet 
cites Bernoulli, de Moivre, Bayes, Price and Laplace [p. Ixxxiij]. 

In Condorcet’s notation, (23) is written mtr In Problem 3, how- 
ever, é is used to mean “infinity”. This is a prime example of what 
Todhunter [1865, art. 660] describes as Condorcet’s “repulsive pecu- 
liarities”. Pearson [1978, p. 480] argues that the curve of judgements 
should be of the form yo(z — 1/2)?(1 — z)!. 

See Dinges [1983, pp. 68, 95] for comment on the occurrence of this 
result in Condorcet’s work. 

The integrand in the second integral is given by Condorcet as (1—z)”. 
Todhunter [1865, art. 698}. 


Here we have another example of Condorcet’s awkward notation: the 
integral (ee z™(1—2)" dz is written as [ aa dz in the original. 
As Todhunter [1865, art. 701] points out, Condorcet ought to say “let 
the probability not be assumed constant”. 

It is this result that Pearson (1978, p. 366] describes as “really Con- 
dorcet’s and Laplace’s extension of Bayes.” 

The factor =) is missing in the original. 

See Gillispie [1972, p. 16] for a general discussion of Condorcet’s appli- 
cation of probability to the voting problem: Auguste Comte’s opinion 
on the matter is discussed in Porter [1986, p. 155]. 

See Pearson [1978, pp. 482-489] for a discussion of Parts 4 and 5. A 
general discussion of the probability of decisions, with special refer- 
ence to the work of Condorcet, Laplace and Poisson, may be found 
in Chapter XIII of Bertrand [1972]. 

For a general discussion of this paper see Pearson [1978, pp. 501-505]. 
Two of which are numbered VI. 

“Qui a pour objet l’application du calcul aux sciences politiques et 
morales” [p. 171]. 

These two articles are respectively entitled “De l’intérét de l’argent” 
[pp. 2-31] (the first page is an introduction) and “Sur une méthode 
de former des tables” [pp. 31-56]. See Crepel [1988a] for a discussion 
of these articles. 

For reference to earlier work on testimony by John Craig (c.1663- 
1731) see Pearson [1978, p. 465] and Stigler [1986c]. The New Dic- 
tionary of National Biography will contain details of Craig’s life and 
work; his Theologie christiane principia mathematica of 1699 is the 
subject of a deep study by Nash [1991]. 

On this point see Pearson [1978, p. 502]. 

The first Article VI (see Note 43 above) is entitled “Application du 
calcul des probabilités aux questions ot la probabilité est determinée” 
[pp. 121-145]; the second is “De la maniére d’établir des termes de 
comparaison entre les différens risques auxquels on peut se livrer avec 


558 Notes: chapter 7 


prudence, dans l’espoir d’obtenir des avantages d’une valeur donnée” 
[pp. 145-150], while the seventh article is “De l’application du calcul 
des probabilités aux jeux de hasard” [pp. 150-170]. 


Chapter 7 


1. For biographical details of Laplace see Cantor [1908, p. 228], David 
[1965], Maistrov (1974, pp. 135-138], Pearson [1929], Pearson [1978, 
pp. 637-650] and Whittaker [1949]. 

2. Some of these memoirs are cited here only for general definitions, 
and not for any Bayesian results: Gillispie [1972, p. 3] in fact finds 
only nine memoirs relevant to probability. For a general discussion 
of Laplace’s early work see Gillispie [1979] and Stigler [1978]. Also 
useful are Sheynin [1977] and Stigler [1975]. 

3. This definition of probability as a ratio of numbers of cases occurs in 
the second edition of de Moivre’s Doctrine of Chances of 1738 (see 
Schneider [1968, p. 279]). However the idea is also evident in the four- 
teenth chapter, “De punctis geminatis” of Cardano’s Liber de Ludo 
Alege (written c.1564), where we find the words “Una est ergo ratio 
generalis, ut consideremus totum circuitum, & ictus illos, quot modis 
contingere possunt, eorumque numerum, & ad residuum circuitus, 
eum numerum comparentur, & iuxta proportionem erit commuta- 
tio pignorum, ut aequali conditione certent.” (See Boldrini [1972, 
p. 125], David [1962, chap. 6] and Ore [1953].) For a discussion of 
equipossibility as it arises in probability see Hacking [1971], [1975] 
and van Rooijen [1942] (the latter contains an illuminating contrast 
between the Dutch terms “gelijkwaardig” and “even waarschijnlijk” ). 
Laplace’s approach to his “definition” of probability was not uncom- 
mon. Robinson [1966, p. 265] notes that “in the approach of Euclid 
and Archimedes, which is also the approach of de |’Hospital, a def- 
inition frequently is an explication of a previously given and intu- 
itively understood concept, and an axiom is a true statement from 
which later results are obtained deductively”. See Gini [1949] for a 
discussion of the difference between the concept and the measure of 
probability. 

4. For a general discussion of (parts of) this memoir see Cantor [1908, 
pp. 241-242] and Gillispie (1972, pp. 4-5]. In addition to the passages 
considered here, this memoir is noteworthy for its discussion of the 
Normal probability density function (see Keynes [1921, chap. XVII, 
§5]). Stigler [1986b] provides a general discussion and a translation 
of the memoir. 

5. When this memoir was written is uncertain: Baker [1975, p. 433], 
acting on a suggestion by Hahn, suggests that it might have been 
written in 1774. Stigler [1978, p. 253], however, is not convinced by 
this suggestion, and his investigations lead him to a date of 1773. 
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. Molina [1930] finds Todhunter’s discussion of the work of Bayes and 


Laplace on the probability of causes “most inadequate” . 


. We might also point out, as Porter [1986, p. 93] has noted, that 


Laplace should be viewed as an independent developer of inverse 
probability. 


. The word used in the original is “établirons” , which may be translated 


in terms of “assert”, “prove” or “establish”: since no proof is given, 
I have chosen the first. 


. This principle is discussed in Keynes [1921, chap. XVI, §§11-14]. Van 


Dantzig {1955, p. 36] seems to regard Laplace’s elaboration of the 
theory of the probability of causes as a youthful aberration, while 
Hacking [1971, p. 348] suggests that Laplace had “fewer philosophical 
scruples” than Bayes — an opinion that seems to be shared by Dinges 
(1983, p. 67]. Gouraud [1848], in commenting on Condorcet’s Essaz, 
writes of the “principe réecemment entrevu par Bayes et démontré par 
Laplace” [pp. 95-96]: however he finds in Bayes’s Essay both a direct 
determination of the probability “que les possibilités indiqueés par 
les expériences déja faites sont comprises dans des limites données” 
and “la premiere idée d’une théorie encore inconnue, la théorie de la 
probabilité des causes et de leur action future conclue de la simple 
observation des événements passés” [p. 62], and it is not clear to which 
of these he is referring. 

It should be noted that Laplace nowhere bestows on it this appella- 
tion, despite what Maistrov [1974, p. 100] says. 

Catalan, in his discussion of this problem, finds it necessary to draw 
attention to the wording of the “futur admirable écrivain” [1888, 
p. 256]. 

See also Molina (1930, p. 382]. 

The reason for the “dz” in the numerator in (1) is nowhere explained. 
However, it was not an uncommon practice in the nineteenth century 
to assign infinitesimal masses to points (rather than infinitesimal vol- 
umes) in the case of continuous distributions. Thus, while we would 
today interpret the numerator in (1) as Pr[x < X < x+dz], Laplace 
had little choice in arriving at (1) as he did. 

For comment on Laplace’s and Gauss’s introductions of the Normal 
probability density function see Stigler [1980b, p. 153]. 

The problem of appropriate division of the accumulated pot as it 
occurs in the game “primero” is discussed in Cardano’s Liber de Ludo 
Ale@: see Ore [1953, p. 117]. 

For further comment see Keynes [1921, chap. XVII, §§5—7], Sheynin 
[1977] and Stigler [1986a, pp. 105-117]. 

See Laplace’s Mémoire sur les probabilités [1778, pp. 476-477] for 
discussion of the case of different y’s. 

See de Morgan [1837, Part II, p. 247]. The milteu de probabilité is, as 
Stigler [1986a, p. 109] notes, just the posterior median. 
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As Wilson [1922-1923, p. 841] has noted, “the first two laws of error 
that were proposed both originated with Laplace.” The first of these 
laws is the one discussed here: the second was given in the memoir 
of 1778. For the place of this work in the theory of least squares see 
Harter [1974]. 

The Laplacean distribution, in the form 


Pr{dz|m,a,H]= = exp (-H=") aS ; 
a a 


is described in Jeffreys [1961, p. 213] as the median law. Jeffreys (loc. 
cit.) remarks that 


The interest of the law is reduced somewhat by the fact 
that there do not appear to be any cases where it is true. 


My discussion here owes much to Stigler [1986a, pp. 113-117]. 

This formulation seems to get round a difficulty seen by Sheynin 
[1977, p. 7] in the defining of the integral. 

For further comment on the St Petersburg paradox see Note 11 to 
Chapter 8, Westergaard [1968, pp. 106-110] and the correspondence 
between Niklaus Bernoulli, Daniel Bernoulli, Pierre Rémont de Mont- 
mort and Gabriel Cramer, reprinted as the ninth commentary in Vol- 
ume 3 of Die Werke von Jakob Bernoulli of 1975. The problem is 
described in Dale [1995, p. 143]. The game known as “cross or pile” 
is discussed in Brewer’s The Dictionary of Phrase and Fable. 
Hacking [1971] traces the publicization of the principle of indifference 
to Bernoulli’s Ars Conjectand1. 

The published version mistakenly gives the date of reading as “10 
Février 1773” instead of “10 Mars 1773” (and continued on the 17th): 
see Baker [1975, p. 432] and Stigler [1978, p. 252]. The memoir is 
described by Gillispie [1972, p. 5] as “astonishing” . 

For comment on the precise use of the term hazard in the Ency- 
clopédie ou Dictionnaire raisonné des sciences, des arts et des métiers 
[1751-1765], see Novy [1980, p. 29]. 

See Gillispie [1979]. The original manuscript has apparently not been 
preserved, though a copy is to be found in the Procés- Verbauz, t. 96, 
1.122. 

All page numbers of quotations in this section are from Gillispie 
[1979]. 

For instance, letting e; denote a positive error and f; a negative error, 
one may require 


(i) Does = 20 fi, or 
(ii) }>e; Prfe;| = 5° f; Pr fi], etc. 
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Presumably the argument runs as follows: suppose we have only two 
systems S; and S2 with S, = {y1, 95}, S2 = {¥7, 3}, and where yi 
represents the probability of an error in the jth observation in the 
ith system. (We shall write y} for the more correct }(q¢) — 2), etc.) 
Then 7 

re=i~r , 1€ {1,2}, 
and hence 

mere pe etl ee: 
Pre To Py Po Po 


Now 


(yi + y{)(~2 + 93) viv, + vis + vi—s + 193 


= rytre+(y?y9 + v1) , 


and presumably the parenthesized term is zero since no S; gives rise 
to such a combination of y’s. 


It is not clear here whether by “nombre infini” Laplace means an 
infinite, or merely a very large, but finite, number: I suspect the 
latter. 

The following sketch shows a possible distribution of m = 7 points 
over h = 2: 


x 
x x 
x X 
x xX 

Se ee ee 
A B N 


The integration could presumably be effected by expansion of the 
logarithms in series and term-by-term integration. 

See Baker [1975, p. 434] and Gillispie [1972, p. 8]. 

This abstract is known to be by Condorcet — see Baker [1975, p. 169] 
and Stigler [1975, p. 252]. 

For a general summary of the memoir see Gillispie [1972, pp. 8-10]. 
For further discussion of this problem see Netto [1908, p. 232] and 
Makeham [1891a, pp. 242-243]. See also Stout and Warren [1984, 
p. 212], where it is stated “We are concerned with efficiently using 
flips of a coin of unknown bias to simulate a flip of an unbiased coin. 
This problem is quite natural in that when given an arbitrary coin 
one should assume that it has some unknown bias.” The importance 
of the distinction between the flipping and the spinning of the coin 
is emphasized by Shafer [1994, p. 71]. 
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For the case in which a = 0, see p. 390 of the memoir discussed 
and also Todhunter [1865, art. 891]. L’Hospital’s rule is required. 
(According to Boyer [1968, p. 460], this rule is in fact due to Jean 
Bernoulli. Note too that the Marquis’s family name is also spelled 
“Hospital”, “Lhospital” and “l’H6pital”. You pays your money and 
you takes your choice!) 


. For details see Todhunter [1865, art. 891]. 
. See Edwards [1978, p. 116]. 
. See Netto [1908, pp. 243-244]. It should be noted that Laplace stresses 


the time-order of the events, and on this topic Shafer [1982] may be 
consulted. 


. See Todhunter [1865, art. 893]. 
. There is no mention here of either de Moivre or Stirling, though in 


Article XXIII Laplace writes of the “beau théoreme de M. Stirling sur 
la valeur du produit 1.2.3...u, lorsque u est un tres grand nombre”. 
An alternative approximation to P may be derived as follows: on 
writing P as a ratio of factorials, and on applying the Stirling-de 
Moivre approximation to each of these factorials, one gets 


(p+ m)ptmt3(q+n)itrta(p+ qt Lptets 
“" petd git3(ptqtm+nt lptitmints | 


P (*) 


Using Laplace’s approximation 
(D+ pet = pte 


(sic) one can write (*) as 


par (sctar) (geen) 

pt+qt+l ptqtl 

which differs from the expression given by Laplace in having (p+q+1) 
instead of (p+ q). 

A generalization of this result is given in Article XXIII: see also Ar- 
ticle XXV. 

For some discussion of this matter see Gillispie [1972, pp. 8-10]: it 
received further consideration by Laplace in a memoir discussed in 
§7.8 of the present work. 

Sheynin [1971b, p. 235] has pointed out that Laplace often used the 
Bayesian conception of supposing that a constant but unknown pa- 
rameter had a prior distribution. This was in fact not done by Bayes 
himself. 

The problem is again considered in the Théorie analytique des prob- 
abilités, but for a period of 40 years rather than 26. According to 
Boldrini [1972, p. 184], W. Lexis, at the end of the nineteenth cen- 
tury, showed “that the probability of masculine births varies with 
time and place.” 
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In a letter to Gauss on the 31st May 1809, Legendre noted that this 
result was in fact a special case of a more general theorem proved by 
Euler (see Plackett (1972, p. 243]). 

See Todhunter [1865, art. 902] for details of other discussions of this 
problem. ° 

In defining u and u— zx Laplace uses “probabilité” and “possibilité” 
respectively: this seems to suggest that he did not always find it nec- 
essary to observe a distinction between these two terms. See Daston 
[1979, p. 266] for a discussion of D’Alembert’s observation of the dif- 
ference. 

Todhunter (1865, art. 902] points out an error in Laplace’s evaluation. 
Todhunter [1865, art. 902] finds the solution “very obscure”, and in- 
dicates where a better solution may be found. Laplace’s writing is 
indeed very often difficult to follow: Sheynin [1973, p. 300] mentions 
“the well known obscurity of Laplace’s style”, and notes further that 


Laplace’s work is extremely difficult to read because of 
the absence or insufficiency of intermediate calculations. 
Moreover, conditions under which his problems are actu- 
ally solved are rarely stated explicitly. [p. 301] 


De Morgan too has remarked on the obscurity of Laplace’s style as 
follows: 


No one was more sure of giving the result of an analytical 
process correctly, and no one ever took so little care to point 
out the various small considerations on which correctness 
depends. [1843, Art. 52] 


De Morgan [1838, pp. 87-88] provides a discussion of the separate ad- 
vantages conferred by the terms probability, chance, presumption, pos- 
stbility, facility and expectation. Lagrange preferred “ facilité” for the 
physical, objective concept of probability (see Hacking [1971, p. 350): 
Laplace was not always so careful. 
Compare the quotation from p. 419 of this memoir given earlier in 
this chapter. 
As in Note 42 above, we can obtain a slightly different approximation 
to that given here. Let m = p and n = q in Equation (*) in Note 42. 
Then | 
92p+2q+1 pers q?t3(p +qt 1yptat3 


Po BS ae ee ees = 
Vo 2p+2q+2 peta qit2(p gat 1)2p+2q+5 


Be 7 ( p )’( q er 
J2°\pta+ 1) \ptatl/) \pt+at+ 3 | 
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Todhunter [1865, art. 904] writes “The theory does not seem, however, 
to have any great value.” 

For more details on Laplace’s theory of errors see Sheynin [1977]. 

A discussion of Laplace’s milieu de probabilité may be found in Make- 
ham [1891a, p. 246] and Stigler [1986a, p. 109]: see Note 18 above. 
This is effected by setting f(z + 6) » f(z) + bf'(z), where f = y, 
z=axand be {ap,ap’,...}. 

Gillispie [1972, p. 10] states that it was here that Laplace “first em- 
ployed phrasing famous from his later popularization” (i.e. the Essai 
philosophique sur les probabilités), but I suggest that the sentiment 
is already patent in the memoir discussed in §7.4. 

There is some slight discussion of this formula in Netto [1908, pp. 244— 
245]. 

He in fact finds “la probabilité que la valeur de x est comprise entre 
les deux limites a — 6 et a+6’” [p. 305]. 

See Netto [1908, p. 246]. 

Laplace omits this phrase. 

For comment on this example and related work see Todhunter [1865, 
art. 909]. 

There is some superficial discussion of this memoir in Westergaard 
[1968, p. 82]: a more detailed treatment may be found in Gillispie 
[1972, pp. 10-11]. 

The memoir was in fact read on 30th November 1785. 

Chang [1976] notes that Laplace was the only one of the mathemati- 
clans of his time who examined this estimation problem, to see the 
need, and to find an expression, for the prevision of the estimate. 
Compare Stigler [1986b, p. 361]. 

In his work on statistical series, Andrei Andreevich Markov (1856- 
1922) considered the case in which an event A appeared ko times 
in no trials and k times in n trials. This led him to an expression 
remarkably similar to that given here, viz. 


1 
f gktko(] ss g)(n—k)+(no—ko) f(g) dx 
0 


) 


n 
" 1 
[ eo(L = 2)Po-*o f(x\dz 
0 
where 0 < m, < f(x) < me, say. He then showed that 


Pri[|k/n — ko/no| < el] & 1, 


“so that the tacit assumption that all prior probabilities x of the 
occurrence of A were equally possible did not influence the outcome” 
[Sheynin 1989, p. 357]. 

See Pearson [1928, pp. 170-171] for an alternative solution. 
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There had probably been an earlier publication of these lectures, for 
a footnote on p. 169 reads “Depuis la premiere publication de ses 
lecons ...” The reference might be to the 1810 publication “Notice 
sur les probabilités” . 

For a discussion see Sheynin [1977]. 

The memoir also contains an example of Laplace’s procedure for find- 
ing the probability distribution of the sum of a number of identically 
distributed random variables: see Seal [1949, pp. 225-226]. Moreover, 
it is here that Laplace “first developed the characteristic function as 
a tool for large-sample theory and proved the first general Central 
Limit Theorem” [Stigler 1975, p. 506]. 

These limits are incorrectly given in the Guvres completes as trh/n. 
Some details are repeated on p. 351 of the supplement. 

In the corresponding passage in the Théorie analytique des proba- 
bilités the reference to Daniel Bernoulli, Euler and Gauss is replaced 
by the phrase “des géometres célébres” . 

Writing on the history of the use of generating functions in probability 
theory, Seal [1949] notes that 


The interesting fact emerges from these references [to some 
of Laplace’s work] that in no case did Laplace use a prob- 
ability generating function to derive an explicit form of 
probability law for the sum of n specified random variables 


[p. 220], 


and further, “Laplace never used the term [generating functions] in 
connexion with the synthesis of a probability distribution” [p. 221]. 
Sheynin [1973, p. 292], however, dissents slightly from Seal’s view. 
Jaynes [1976, p. 233] suggests that “an historical study would show 
that the reasons for the interest of both Laplace and Jeffreys in prob- 
ability theory arose from the problem of extracting ‘signals’ (i.e. new 
systematic effects) from the ‘noise’ of imperfect observations, in as- 
tronomy and geophysics respectively.” 

Note also the brief discussion of Laplace’s Exposition du systéme du 
monde in his Essat philosophique sur les probabilités. 

For details see Fabry [1893-1895, p. 5]. 

See Sheynin [1976, p. 164] for comment on the difficulty of this mem- 
olr. 

Fabry [1893-1895, p. 2] points out that “l’absence des orbites hyper- 
boliques est une objection contre cette théorie”. 

The radius of this sphere is taken here to be 10° Astronomical Units. 
Gauss [1874, p. 582] gives Laplace’s quaesitum as follows: “Laplace 
findet fur das Verhaltniss der Wahrscheinlichkeit einer solchen Hy- 
perbel, wo die halbe grosse Axe 100 Halbmesser der Erdbahn nicht 
ubersteigt, zu der Wahrscheinlichkeit der ubrigen Falle, nemlich einer 
Hyperbel von grosserer Axe, einer Parabel oder einer Ellipse.” 
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See also Pollard [1966]. 

For reasons why attention need be restricted only to values of 8 < 7/2 
see Fabry [1893-1895, p. 7]. 

For a proof see Fabry [1893-1895, pp. 7, 8]. 

Fabry [1893-1895, p. 8] terms comets with perihelion distance less 
than d, “comeétes visibles” . 

See Fabry [1893-1895, §4]. 

His definition runs as follows: “... la probabilité d’un événement est 
le rapport du nombre des cas favorables au nombre total des cas 
possibles” [p. 3]. 

Fabry [1893-1895] considers something similar in his forty-second and 
forty-third articles. 

Seeliger, in his paper of 1890, introduces a term y(D) at the outset: 
I have experienced some difficulty in following his argument. 

Gauss [1874, p. 582] finds this “einer sehr plausibeln Hypothese.” 
See Fabry [1893-1895, p. 5] for details of these and other papers. 
Seeliger’s [1890] corrections, though at first sight different from those 
of Fabry, can in fact be shown to be in agreement with the latter’s: 
see Fabry [1893-1895, pp. 31-34]. 

See Schiaparelli [1874, p. 80]. 

See Fabry [1893-1895, p. 19]. 

Compare Schiaparelli [1874, p. 80]. 

Fabry {1893-1895, p. 20] comments on Laplace’s procedure as follows: 
“Tl est & remarquer que par la maniére dont Laplace conduit son 
calcul, il fait U infinie seulement implicitement en supposant 2 infinie; 
c’est peut-étre pour cela qu’il n’a pas réfléchi aux conséquences de 
cette supposition.” 

See Fabry [1893-1895, pp. 25-26] for details. 

The expression given in (46) differs from that given by Gauss and 
that given by Seeliger: see Fabry [1893-1895, p. 34] for a discussion. 
See Schiaparelli [1874, p. 80]. 

Note Fabry [1893-1895, pp. 31-43]. Further development of the mat- 
ter discussed in this memoir may be found in the papers by Fabry, 
Schiaparelli and Seeliger. 

Sheynin [1977, p. 59], in writing of the first Supplement, says that 
it is “essentially compiled from two memoirs”, but a cursory exami- 
nation shows that the two early papers are not just reprinted in the 
Supplement. 

The first edition of 1812 was dedicated to Napoleon: for a discussion 
of the suppression of this dedication in later editions see Pearson 
[1929]. The full text of the dedication is given in Todhunter [1865, 
art. 931]. 

For a general discussion of this Legon see Fagot [1980, pp. 59-77]. 
A copy of this Notice is provided in Gillispie (1979, pp. 265 et seq.]. 
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General comments on the Essai may be found in Maistrov [1974, 
§I11.9] and Neyman & Le Cam [1965, pp. iv-ix]. More detailed studies 
are given in Pearson [1978, pp. 651-703] and Todhunter [1865, arts 
933-947]. See also Herschel [1857, p. 393]. A translation of the Essai 
is given in Dale [1995]. 

See Coolidge [1949, §XIII.2] and Zabell [1988a]. 

See also Pearson [1978, p. 660] and Todhunter [1865, art. 643]. 

For comment on this matter see Keynes [1921, chap. XVI, §§16-19], 
Pearson (1978, pp. 671-672, 682-683] and Zabell [1988b]. 

See Dale [1995, pp. 180-181] for a slightly more detailed discussion. 
Note also the extensions to other examples given there. 

See Pearson [1978, p. 674] for comment on a similar four-fold division 
elsewhere in Laplace’s work. For a discussion of Laplace’s work on 
testimony see Zabell [1988b]. 

In an anonymous review of 1837, de Morgan describes this work as 
“the Mont Blanc of mathematical analysis”, but he qualifies this with 
the words “the mountain has this advantage over the book, that there 
are guides always ready near the former, whereas the student has been 
left to his own method of encountering the latter” [p. 347]. Bertrand 
[1972, p. v] writes “Le Calcul des probabilités est une des branches 
les plus attrayantes des Sciences mathématiques et cependant |’une 
des plus négligées. Le beau livre de Laplace en est peut-étre une des 
causes.” 

Some discussion may be found in Molina [1930] and, in eztenso, in 
Todhunter [1865, arts 948-968]. 

Page numbers throughout this section are to the seventh volume of 
the Guvres completes de Laplace of 1886. 

See Sheynin [1971b, p. 237] for a discussion of the looseness of this 
definition of probability. Edgeworth, however, gives at least qualified 
assent to this definition, writing 


Nor does there appear any objection to the use of such 
phrases as Donkin’s “sufficient reason,” or Laplace’s “num- 
ber of favourable cases,” provided it is admitted that they 
are but short titles of the voluminous records of experience; 
or at least, what the better class of a priorists would admit, 
that general propositions cannot dispense with experience. 


[1884c, p. 160] 


See Clero [1988] and Shafer [1982] for discussion of the importance of 
the consideration of time-dependence in conditional probability. 
Compare the discussion of Principle VII. 

For comment on Laplace’s two methods of inversion of Beinoulle S 
Theorem see Dale [1988b, §6] and Monro [1874, pp. 74, 77]. Note 
also de Morgan [1838b]. Laplace’s inverse application is also noted by 
Cajori [1991, p. 377]. 
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Edgeworth [1908] finds in this article the employment of the genuine 
inverse method in the determination of the most probable value of 
something being measured. However he claims that the true character 
of inverse probability is overshadowed here by the doctrine of greatest 
advantage. | 

See Neyman & Le Cam [1965, p. vii] for comment on Laplace’s use 
of the posterior distribution of a parameter given a large number of 
observations. 

See also Edgeworth [1908, part II]. 

The data seem to be from baptismal records — see p. 384 of the 
Théorie analytique des probabilstés. 

More accurately, Laplace considers births in London and baptisms in 
Paris. 

Czuber, unlike Laplace (see previous note), views all the figures, for 
London and Paris, as berths. 

Lidstone [1941] provides a discussion, based on a suggestion of G.F. 
Hardy, of a suitable choice of prior — see §9.12. 

A similar problem had been earlier considered by Laplace in his Sur 
les naissances...: see §7.8. 

My attention was drawn to the following points on reading F.-Y. Edge- 
worth’s review of H. Westergaard’s Scope and Method of Statistics as 
reprinted in Mirowski [1994]. 

Westergaard [1968, p. 82] describes the 22nd of September 1802 as 
New Year’s Day; however, while the new Republican era certainly 
began on the 22nd of September 1792 (the day of the proclamation 
of the Republic), the division of the year into twelve months of thirty 
days each, together with the added five Sans-culottzdes days annually 
and an extra day every four years (the first falling in An III, i.e. 
1795) resulted in New Year’s Day in 1802 falling on what was almost 
every else regarded as the 23rd of September. The new system was 
abandoned on the Ist of January 1806. For more details see the article 
“French Revolutionary Calendar” in the fourteenth edition of the 
Encyclopedia Britannica. 

The same change is made in the introductory Essaz philosophique sur 
les probabilités — see Dale [1995, p. 40]. 

See Westergaard [1968, p. 83]. 

For details of the earlier treatment see Todhunter [1865, art. 1032]. 
See Todhunter [1865, art. 1036] for references to earlier discussions of 
this problem. 

No divorces! 

Independence is implicit here. 

It might well be argued whether the assumptions of what is now called 
a Bernoulli trial in fact hold here. 
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Laplace follows this with the following words: “On formera ainsi, 
d’année en année, une Table des valeurs de 7. En faisant ensuite une 
somme de tous les nombres de cette Table, et en la divisant par n, on 
aura la durée moyenne des mariages faits a l’a4ge a pour les garcons 
et a lage a’ pour les filles” [p. 424]. 

The original has “infinis” . 

See Whittaker and Watson [1973]. 

Todhunter’s [1865] analysis in his Article 1037 is slightly different. In 
his next article he derives as a natural consequence of this newer 
an extension of Bernoulli’s Theorem. 

The consideration of both the loss function and the prior infor- 
mation is typical —- indeed, characteristic — of modern Bayesian 
decision theory (see Berger [1985, p. 158]). 

This case is described by Todhunter [1865, art. 1040] as “a modifica- 
tion of the problem just considered, which may be of more practical 
importance.” 

Edgeworth [1911] translates “l’espérance morale” as subjective advan- 
tage. 

Todhunter [1865, art. 1042] in fact sees this entire chapter as “mainly 
a reproduction of the memoir by Daniel Bernoulli.” 

For a discussion of this chapter see Shafer [1978, pp. 348-349]. 

See Walker [1929, p. 21] for further comment. 

There are some passages in the early parts of this Supplement that 
are not to be found in the memoirs. 

Comment on the matter of this section may be found in Bertrand 
(1907, chap. XIII] and Poisson [1837, pp. 2-7]. Todhunter [1865] does 
not discuss this section at all. Sheynin [1976] provides a brief sum- 
mary, while Pearson [1978, pp. 690-692] gives a more detailed inves- 
tigation. On the choice of the best jury system see Hacking [1984]. 
What Laplace means by an eguitable (“juste”) opinion of the tri- 
bunal is discussed earlier in this Section, where the following ques- 
tion is posed: “la preuve du délit de |’accusé-a-t-elle le haut degré de 
probabilité nécessaire pour que les citoyens aient moins a redouter 
les erreurs des tribunaux, s’il est innocent et condamné, que ses nou- 
veaux attentats et ceux des malheureux qu’enhardirait |l’example de 
son impunité, s’il était coupable et absous?” [p. 521]. A little later 
on [p. 522] he states that the decision of a tribunal is equitable if it 
conforms to the true (“vraie”) solution of the question. 

Laplace talks of the integral in the denominator below as a “somme’”, 
and, as Pearson [1978, p. 692] has pointed out, no mention is made 
of the Euler-MacLaurin bridge. 

Laplace in fact discussed incomplete beta-functions in considering 
the incomplete binomial summation. This was perhaps not realized 
by Pearson (see Molina [1930, p. 376}]). 

See Pearson (1978, pp. 691-692]. 
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For details see the Théorie analytique des probabilités, p. 535. 

For a discussion of this assumption see p. 536 of the Théorie analy- 
teque des probabilités. 

See Lancaster [1966] for a discussion of Laplace’s determination of 
the posterior distribution of h, this distribution being viewed there 
as a forerunner of the x? disteibution. 

See p. 549 of this Supplement for a discussion of what ee if this 
latter assumption is not met. 

For some references to general remarks on the Théorie analytique des 
probabilités see Todhunter [1865, art. 1052]. 

Laplace’s work is discussed from the “inductive behaviour” versus the 
“inductive reasoning” point of view by Neyman [1957, pp. 19-21]. 
The following discussion owes much to Karl Pearson — see Pearson 
(1978, pp. 366-369]. 

A generalization of Bayes’s theorem is given in Pearson [1924b]. 
The French aphorism Revenons a ces moutons has an English equiv- 
alent that is given in Booth [1865, p. 278] and is presented here for 
some light relief: 


A weighty lawsuit I maintain; 

’T is for three crab-trees in a lane. 

The trees are mine, there’s no dispute, 
But neighbour Quibble crops the fruit. 
My counsel, Bawl, in studied speech, 
Explores, beyond tradition’s reach, 
The laws of Saxons and of Danes, 
Whole leaves of Doomsday-book explains, 
The origin of tithes relates, 

And feudal tenures of estates. 

‘If now you’ve fairly spoke your all, 
One word about the crab-trees, Bawl.’ 


On the important distinction in psychological tests between asking 
subjects for responses that are as unpredictable as possible and ask- 
ing them to produce responses as randomly as possible see Ayton & 
Wright [1994, p. 173]. 


Chapter 8 


1. 


2. 
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On the contribution actually made by Bernoulli see Pearson [1925]: 
Sheynin [1968] considers Pearson’s judgement of Part 4 of the Ars 
Conjectandi as unsatisfactory to be “hardly fair’. 

On the history of the Poisson distribution see Dale [1989], Good (1986, 
p. 166], Haight [1967] and Stigler [1982b]. 

The term 2/,/7 below is mistakenly given in the original [p. 271] as 
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. This example is again considered in Poisson’s Recherches sur la prob- 


abilité des jugements: see Sheynin (1978, p. 272]. 


. See Note 45 to Chapter 7. 
. Poisson’s reference is to p. 383 of the Théorie analytique des proba- 


bilités: this is p. 391 of the 1878-1912 Guvres completes edition. 


. According to Haight [1967] “Certain authors give the date 1832 for 


Poisson’s Recherches” [p. 113]: none are cited by name, and I have 
found no evidence of a publication date preceding 1837. See also 
Maistrov [1974, p. 158]. 


. For further comment on Poisson’s distinction between “chance” and 


“probability” see von Kries [1927, p. 275] and Daston [1994, pp. 335- 
336]. Hacking [1984] finds the first clear distinction between subjec- 
tive and objective probabilities in this work of Poisson’s, while Good 
(1986, pp. 157-158] considers Poisson’s concept of probability to be 
more that of logical probability, or credibility, than that of subjective, 
or personal, probability. 


. On the importance of time-order in connexion with such conditional 


probabilities see Shafer [1982]. 

There is no twelfth section. 

The early history of the St Petersburg paradox is related in Proctor 
[1889] as follows: 


It occurred to the Russian government, which has at all 
times been notably ready to take advantage of scientific 
discoveries, that a method might be devised for despoiling 
the public more effectually than by the Geneva method. 
[p. 148] 


(In the Geneva lottery five numbers were drawn from ninety. If a sin- 
gle number were bet on, the drawing of that number would yield 15 
times the value of the stake for the bettor. Two chosen numbers being 
drawn would result in the payment of 270 times the stake, and so on: 
for further details of similar lotteries see the Notes to the section “On 
analytical methods in the probability calculus” in Dale [1995].) In the 
Russian scheme the prize was to be determined by the tossing of a 
coin, the speculator receiving 2” currency units if the first head ap- 
peared on the nth toss. As there was a chance, small though it might 
be, of very large winnings, it was felt that the masses would jump 
at the opportunity afforded by this lottery, and mathematicians were 
asked to determine the fair value of a chance, so that the entrance fee 
might be appropriately increased. Unfortunately, as Proctor notes, 


a high and practically prohibitory price must first be set 
on each chance, and even then the lottery-keepers could 
only escape loss by restricting the number of purchases. 
The scheme was therefore abandoned. [1889, p. 151] 
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The idea of this kind of lottery has a certain fascination, however, 
and the scheme has even been described in the classics. For in W. M. 
Thackeray’s Character Sketches: Captain Rook and Mr. Pigeon we 
read 


there is a plan which the commonest play-man knows, an 
infallible means of retrieving yourself at play: it is simply 
doubling your stake. Say you lose a guinea: you bet two 
guineas, which if you win, you win a guinea and your orig- 
inal stake; if you lose, you have but to bet four guineas on 
the third stake, eight on the fourth, sixteen on the fifth, 
thirty-two on the sixth, and so on. It stands to reason that 
you cannot lose always; and the very first time you win, all 
your losings are made up to you. There is but one drawback 
to this infallible process: if you begin at a guinea, double 
every time you lose, and lose fifteen times, you will have 
lost exactly sixteen thousand three hundred and eighty four 
guineas — a sum which probably exceeds the amount of 
your yearly income; mine is considerably under that figure. 


[1869, pp. 396-397] 


For mathematical details of this paradox see Feller [1968, §X.4], 
Martin-Lof [1985] and Todhunter (1865, arts 389-393]: the matter is 
also discussed in Paty [1988, §3], while a detailed study may be found 
in Jorland [1987]. 

In this generalization it is supposed that the random variables, al- 
though still two-valued and independent, are in general differently 
distributed: i.e. Pr [X; = 1] = p; = 1 — Pr[X; = 0] for each 2. Good 
[1986, p. 160] regards Poisson’s Law of Large Numbers as “perhaps 
[his] main direct contribution to the mathematical theory of proba- 
bility and statistics.” 

Compare Poisson’s Article 46 discussed earlier. 

For a detailed discussion of this result, and Poisson’s generalization 
of it, see Keynes [1921, chap. XXIX], where Bernoulli’s theorem is 
viewed as one that “exhibits algebraical rather than logical insight” 
[p. 341]. Keynes declares that the conditions under which the theorem 
is valid are usually not realized in practice. 

Keynes [1921, chap. XXIX, §2] points out that the approximation in 
fact requires that ppq be large. 

Keynes {1921, chap. XXIX, §2] considers the simpler approximation 
(2/./7) J, exp(—t?) dt satisfactory, in practice, in view of all the ap- 
proximations involved in the derivation. 

For comment on an oversight made by Poisson in the derivation of 
this result see de Morgan [1838b]. 

The factor “3” in the denominator is missing in Todhunter [1865, 
p. 556). 
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The Bayesian nature of Poisson’s approach here is commented on by 
Good [1986, p. 161]. 
The reference is to his Hssaz d’arithmétique morale. Further details 
of the experiment may be found in Keynes (1921, chap. X XIX]. 
A deep study of Poisson’s models for decisions taken by juries in both 
criminal and civil trials may be found in Gelfand and Solomon [1973]. 
There is no “not proven” verdict. 
Good [1986, pp. 167-168] sees in this work of Poisson’s a sequen- 
tial use of Bayes factors. Note also the comment by Solomon [1986, 
pp. 174-176] on the Poisson jury model. 
Apart from those cases mentioned here, Poisson considers £ = 1/2, 
C= andl = 0) fo 1/2. 
See the first supplement to his Théorte analytique des probabilités. 
The copy of this tract in the Wishart Library of Cambridge Univer- 
sity bears on the cover “De Morgan on Probability”. Inside, an inked 
inscription reads “J.C. Adams Esq. from the Authors J.W. Lubbock 
& J.E.D. Bethune.” In its biographical note on Lubbock the Dic- 
tionary of National Biography says of this work “A binder’s blunder 
caused this work to be often attributed to De Morgan, despite his fre- 
quent disclaimers” [Vol. XII, p. 227]. According to the ninth edition 
[1877] of the Encyclopedia Britannica, de Morgan found that this 
error “seriously annoyed his nice sense of bibliographical accuracy.” 
The matter was only settled after fifteen years and a letter from de 
Morgan to the Times. 

The persistence of irritating errors in print has been nicely com- 
memorated, in parody of Longfellow, in the following epigram on Lord 
Campbell’s Lives of the Lord Chancellors: 


Lives of great men misinform us: 
Campbell’s lives in this sublime, 

Errors frightfully enormous, 
Misprints on the sands of time. 


[Booth, 1865, p. 284]. 

The generally accepted date of publication seems to be 1830. 

Briefly put, coherence (a notion first explicitly introduced by Ramsey 
in 1926 — see Ramsey [1965]) is defined with reference to degrees of 
belief as measured by betting behaviour: degrees of belief are said to 
be coherent if there is no set of bets that entails that the bettor will 
lose money no matter what event occurs. Coherence in this sense is 
equivalent to conformity of degrees of belief to the rules of the calculus 
of finite probability. The notion has undergone further development 
since Ramsey’s time: see Kyburg and Smokler [1980, pp. 13-15]. 
The term equation of condition is defined by Chrystal as follows: 
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Consider any two functions whatever, say $(z, y,2z,...a, 6, 
c,...), and #P(z,y,z,...a,b,c,...), of the variables z, y, 
z,..., Involving the constants a, b, c,.... 

If the equation 


D( 25 Y) 252i DC, 0 VS p(B, Zs GOs) A) 


be such that the left-hand side can, for all values of the vari- 
ables x, y, z,..., be transformed into the right by merely 
applying the fundamental laws of algebra, it is called an 
identity. ... 

If, on the other hand, the left-hand side of equation (1) 
can be transformed into the right only when z, y, z 
have certain values, or are conditioned in some way, then 
it is said to be a Conditional Equation, or an Equation of 
Condition. [1904, p. 282] 


30. The proof is effected in part by noting that 12 + 2?+.---+ 7? is the 


coefficient of x?/2 in doje e/*: the more general result involving pth 
rather than second powers may be found in Knopp [1990, §64 B]. 


31. The multinomial coefficient given here and above as ere is 


hy 


given as (™!*'"" 7) by Lubbock and Drinkwater-Bethune. 


LyrseyIMhy 


32. Kneale’s [1949, pp. 203-204] criticism of the rule of succession is based 


on ignorance of this extension. 


33. Details of Lubbock’s mathematical argument may be supplied as fol- 


lows: note firstly that 


dritna 
Jam gqpat V7 (ee + fu)” 
qritne ee eee 
—— m—j ™m maa | 2) 
=¥(5) Pama eee) 
j= 


m 
2 (™) o”-i fi(m-+m,—f)n,(na-+ fag 2? Sy) 
—0 


Evaluation of this at (z, y) = (1, 1) results in 


3 # a iy ina +I)! m5 gi 
Jj 


= (m — j)!j! 


as asserted by Lubbock. 
Notice next that on expanding 


f(x,y) = (+h) (14+k)"2(1 + eh + fk)™ 
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in a Taylor series about (0,0), one finds that the term in h”'k”? is 


1 ny+no qritne a aeenSa 7 
(ny + n2)! ny dx™1 dy”2 we 


(0,0) 
Thus n,!n 2! multiplied by the coefficient of h™1k”? is 


qmitna 


dz” dy”2 dam dyno! Wy) 


one eee 


Next, note that, on substituting 1+¢ and 1+ for x and y respec- 


tively, we have 


dvitne 


EY (OD + "| 
dxz™ dy”2 y ( fy) (1,1 


dritna 
dt™: ds”2 


(1+ ¢)™(1+s)"?{e(1 +t) + f+ s))™ 


qritne 


= aaa + t)™(1 + s)2(] + et + fs)™ 


(0,0) 


since e+ f= 1. 


For a short discussion of Bolzano’s introduction of probability see 


Novy [1980, pp. 30-31]. 


A biography of de Morgan may be found in Heath’s edition of 1966 
of de Morgan’s On the Syllogism and Other Logical Writings. For 
details of de Morgan’s work on the history of science, Rice [1996] 


may profitably be consulted. 


MacFarlane [1916, p. 19]. The conundrum is given by de Morgan 
himself, when commenting on a work on the quadrature of the circle 


by one James Smith, as follows: 


I was X years old in A.D. X?; not 4 in A.D. 16, nor 5 in 
A.D. 25, but still in one case under that law. And now | 
have made my own age a problem of quadrature, and Mr. 
J. Smith may solve it. But I protest against his method of 
assuming a result, and making itself prove itself: he might 
in this way, as sure as eggs is eggs (a corruption of X is X), 
make me 1,864 years old, which is a great deal to much. 


[de Morgan 1915, vol. II, p. 124] 


James Smith was an indefatigable writer on the squaring of the circle, 


and de Morgan says of him 
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He is beyond a doubt the ablest head at unreasoning, and 
the greatest hand at writing it, of all who have tried in our 
day to attach their names to an error. 

[de Morgan 1915, vol. I, p. 104] 


Heath [1966, p. vii] asserts that the defective eye was the right. 
Smith [1982] lists 200-odd items, with certain deliberate omissions. 
This excludes references to probability in logical works. 

Arne Fisher is generally disparaging in his remarks on inverse proba- 
bility — see his [1926, §40] and the more detailed criticism in §45 of 
that work. 

See Stigler [1975, p. 507] and Smith [1982, p. 142]. 

Commenting on this quotation Keynes [1921, chap. XVI, §14] writes 
“If this were true the principle of Inverse Probability would certainly 
be a most powerful weapon of proof, even equal, perhaps, to the heavy 
burdens which have been laid on it. But the proof given in Chapter 
XIV. makes plain the necessity in general of taking into account the 
a priort probabilities of the possible causes.” 

Particular mention is made on p. 64 of the case in which n = 0. 

The date is variously given as 1837 (Smith [1982] and the Edin- 
burgh University Library Catalogue), 1845 (Encyclopedia Britan- 
nica, 14th edition, and the National Union Catalogue), and 1849 
(Keynes [1921]). Stigler [1986a, p. 378], citing Sophia de Morgan’s 
Memour of Augustus De Morgan of 1882, says that this article was 
written in 1836-1837. 

On de Morgan’s neglect of the independence condition and attendant 
difficulties see Hailperin [1996, pp. 92, 95]. 

See also Hailperin (1996, pp. 96-98]. 

See Hailperin [1996, §2.2]. 

Commenting on these solutions Hailperin notes 


the impressive degree of sophistication which is apparent in 
De Morgan’s handling of the propositional logic involved in 
his solutions. [1996, p. 101] 


Details of de Morgan’s argument may be found in Hailperin [1996, 
pp. 101-103]. 

Hailperin [1988, p. 164] tantalizingly suggests that de Morgan’s ques- 
tion can be answered by using the theory of Hailperin [1986, §6.7]. 
Hailperin [1996, p. 105] finds two reasons for this inadequacy, viz. 
(a) the possibility that the combined testimony may be inconsistent 
is ignored, and (b) the tacit assumption of the independence of the 
testimony and the arguments. 

This problem Hailperin [1996, pp. 105-106] finds “unusual”. 

For further discussion of this memoir and other works by Bienaymé 
see Heyde & Seneta [1977]. 
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This circumlocution is necessary: the work was printed twice, once in 
LINSTITUT, Journal général des Socrétés et travauzr scientifiques de 
la France et de l’Etranger, and again as an Extrait des procés-verbauz 
of the Société Philomatique de Paris. Both versions appeared in 1840, 
in each case under the general heading Probabilités. 

The original has m(m—1)...(m—r+1)/r(r—1)...1 instead of ("”) 
etc. 

The situation here is analogous to that sometimes encountered in 
descriptive statistics, where a sample variance may be defined by 


either es 
1 
Diss a2 
s ee zt) 
or 
Meee _ 
whence 
Oi n 2 
n—-1l 


Gnedenko [1966, p. 95] notes that the maximum value of the binomial 
probability pd —p)"~™ occurs at the integral part of (n+ 1)p. 
His birth and death dates are 12th September 1801 and 20th Decem- 
ber 1861, a. St., or 24th September 1801 and 1st January 1862, n. St. 
For a discussion of Ostrogradskii’s work see Gnedenko [1951] and 
Maistrov [1974, pp. 180-187]. A number of his publications are listed 
in the Tableau général méthodique et alphabétique ... de St.-Péters- 
bourg depuis sa fondation. 

Ostrogradskii gives his answers in terms of the Vandermonde symbol 
[2]" = a(x@—1)...(e-—n+1). 

Todhunter [1865, p. 558] remarks that Galloway’s article may be 
viewed as “an abridgement of Poisson’(s] Recherches ... sur la Prob.” 
The article was in fact also published as a book. 

Much the same example had been discussed earlier by Bénard [1835], 
Mondesir [1837] and Poisson [1837]. However, while Bénard and Cata- 
lan consider the case in which the second draw is made from the urn 
B, Mondesir and Poisson suppose that the second draw is also from 
urn A. 

Just as all roads lead to Rome — at least in two dimensions (see 
Feller (1968, §XIV.7]). 

In a note to this theorem Catalan points out that “probability” is 
to be understood here as meaning what some mathematicians have 
called probabilité subjective (or probabilité extrinséque, as he prefers 
to call it) in contrast to probabilité intrinséque. 


578 


695. 


66. 


67. 


68. 


69. 


70. 


ree 


2. 


73. 


T4, 


79. 
16. 


ne 


Notes: chapter 8 


In a note to his subsequent paper of 1884 Catalan states that the 
phrase “sont égales a 6” occurring at the end of the following quota- 
tion should in fact be merely “sont égales”. 

In the first edition of this book I misinterpreted Catalan’s table: I am 
grateful to Jongmans and Seneta [1994] for their having drawn this 
slip to my attention. 

Jongmans and Seneta [1994] note that if the sequence of two urns 
considered here is continued, even ad infinitum, then the process 
{(Ni, Xi) : i € IN} is a bivariate Markov chain. Moreover if $; = 
o{(No, Xo),..-,(Ni, Xi)}, then {(N;, X;), 3} is a martingale. 

In a footnote Catalan points out that a similar result had been ob- 
tained by J.B.J. Liagre in his Calcul des probabilités et théorie des 
erreurs, avec des applications aux sciences d’observation en général, 
et a la géodéste en particulier of 1852. 

Porter [1986, p. 85] regards Friess (“competent if not original in math- 
ematics” ) as the introducer in Germany of the frequentist viewpoint. 
Edgeworth [1885, p. 192] regards the discussion of posterior proba- 
bilities in this chapter as “masterly”. 

Sheynin [1991, §4] attributes the introduction of the term “Bayes 
formula” to Cournot. 

There is much discussion in Cournot’s book of chance and probability 
in objective and subjective settings: in fact, Cournot makes the follow- 
ing distinction: “le terme de posstbilité se prend dans un sens objectif, 
tandis que le terme de probabiltté implique dans ses acceptions ordi- 
naires un sens subjectif ” [p. 81]. For further comment on this point 
see Hacking [1971, p. 343], Keynes [1921, chap. XXIV, §3] and Porter 
[1986, pp. 84-85]. Zabell [1988c] in fact finds, in Cournot’s work, three 
distinct categories of probability — objective, subjective and philo- 
sophical, “the last involving situations whose complexity precluded 


“mathematical measurement” [p. 178]. According to Sheynin [1986, 


p. 308], Chuprov [1910, p. 30] described Cournot as “one of the most 
original and profound thinkers of the 19th century, whom his contem- 
poraries ... had failed to appreciate and who rates higher and higher 
in the eyes of posterity.” Edgeworth [1885, p. 192] considers him “a 
first-rate authority” . 

See Lancaster [1994, §5.4] for a discussion of the history of work on 
the inheritance of the sex ratio, including the question of its constancy 
in sibships. | 
These are given by Cournot as Mercury, Venus, the Earth, Mars, 
Vesta, Juno, Ceres, Pallas, Jupiter, Saturn and Uranus. 

No doubt some village Brahe, or mute inglorious Kepler. 

Courts of first instance were given primary jurisdiction, mainly for 
civil cases. 

For further details see, in addition to the original, Laplace’s Essaz 
philosophique sur les probabilités of 1814 (translated in Dale [1995]). 
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For details of these changes see Robson’s edition, of 1974, of Mill’s 
Collected Works. 


79. In the first edition Mill in fact concluded that “the condition which 
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Laplace omitted is not merely one of the requisites for the possibility 
of a calculation of chances; it is the only requisite” [§2]. Mill’s change 
of heart was to a large extent brought about by criticism invited by 
Mill from John Herschel. This criticism was incorporated into the 
Logic as early as 1846. For further details see Strong [1978, §3] and 
Porter [1986, p. 83]. Porter (op. cit., p. 82) regards Mill’s comments 
in the first edition of his Logic as “one of the harshest denunciations 
of classical probability written in the nineteenth century.” 

Further evidence for Mill’s support for a frequency approach to prob- 
ability is provided by the following remarks: 


Before applying the doctrine of chances to any scientific 
purpose, the foundation must be laid for an evaluation of 
the chances, by possessing ourselves of the utmost attain- 
able amount of positive knowledge. The knowledge required 
is that of the comparative frequency with which the differ- 
ent events in fact occur. For the purposes, therefore, of the 
present work, it is allowable to suppose, that conclusions 
respecting the probability of a fact of a particular kind, rest 
on our knowledge of the proportion between the cases in 
which facts of that kind occur, and those in which they do 
not occur: this knowledge being either derived from specific 
experiment, or deduced from our knowledge of the causes 
in operation which tend to produce, compared with those 
which tend to prevent, the fact in question. 


(1872, Book III, Chap. XVIII, §3.] 


Strong [1978, p. 34] notes Mill’s argument for a frequency theory of 
probability, some 25 years before John Venn. 

We have already (see §3.6) noted the special role played by the first 
occurrence of an event. In the 1851 edition of his System of Logic Mill 
added the following footnote: 


After the first time of happening, which is, then, more 
important to the whole probability than any other single 
instance (because proving the possibility), the number of 
times becomes important as an index to the intensity or 
extent of the cause, and its independence of any partic- 
ular time. If we took the case of a tremendous leap, for 
instance, and wished to form an estimate of the probabil- 
ity of its succeeding a certain number of times; the first 
instance, by showing its possibility (before doubtful) is of 
the most importance; but every succeeding leap shows the 
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power to be more perfectly under control, greater and more 
invariable, and so increases the probability ... 


[Book III, Chap. XVIII, §4.] 


82. See Sheynin [1986] for a detailed study of Quetelet’s statistical work. 


83. 


84. 
85. 


86. 


The definition of probability given in this quotation may also be found 
in Quetelet [1828], where, in Lesson XII, “On the calculation of a 
probability, when the number of favourable chances are not known”, 
we read 


Probability is a fraction which has for its denominator the 
number 2, multiplied as many times into itself, as the event 
has been observed consecutively; and, for its numerator, 
this same product, less 1. [p. 57] 


(Quotations here are from Beamish’s translation of 1839.) However, 
some confusion about this definition seems to be felt, for later on we 
find the words 


The probability may thus be calculated, that an event will 
be reproduced many times, which had already been ob- 
served for a certain number of times in succession ... This 
probability will be represented by a fraction, which has for 
its numerator the number of observations made plus 1, and 
for its denominator the same number plus 1, and plus, also, 
the number of times that the event ought to be reproduced. 
[pp. 57-58] 


And yet again 


When we observe two sorts of events, the probability that 
one of these events will reproduce itself once, is a fraction 
which has for its numerator, the number of times that the 
event in question has been observed plus 1; and, for its 
denominator, the total number of observations plus 2. 


[pp. 59-60] 


See p. 15 of Downes’s translation of Quetelet [1846]. 
La Grande Encyclopédie has no record of his death: it was, however, 
certainly after 1876. His christian names are given in varying orders. 
Although Ellis was Senior Wrangler in Trinity College, Cambridge, 
in 1840, his biographer Harvey Goodwin states that 


it is a mistake to suppose that Ellis was in any exclusive 
or even preponderating degree devoted to mathematics: his 
mathematical power was no doubt very great, but I think 
not greater than several other powers, and certainly his 
taste by no means exclusively leaned in this direction. 

[W. Walton, 1863, p. xx] 
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Goodwin notes further that “[Ellis] always seemed to talk on the 
subject of Probabilities with great pleasure, and as one in which he 
was thoroughly at home” (op. cit., p. xxix). 

In his paper “On the application of the theory of probabilities 
to the question of the combination of testimonies or judgements” 
of 1857, Boole writes of Ellis “There is no living mathematician for 
whose intellectual character I entertain a more sincere respect than I 
do for that of Mr. Ellis” [Boole 1952, p. 350]. 

Salmon [1980a] does not regard Ellis, rather than Venn, as the first 
frequentist: he in fact concludes that “Ellis ... took us to the very 
threshold of a frequency theory of probability; Venn opened the door 
and led us in” [p. 143]. Boldrini [1972, p. 124] finds the relative fre- 
quency conception of probability “Formulated many years ago in Italy 
by G. Mortara”. For some discussion see Boldrini, op. cit., pp. 140- 
141. 

There is also some discussion, though less detailed, in Maistrov [1974, 
pp. 173-180]; one should note the comments in Sheynin [1991-1992] 
on this discussion. | 

This example, given in Laplace’s Essai philosophique sur les pro- 
bibilttés, runs as follows: 


Nous voyons sur une table, des caracteres d’imprimerie, dis- 
posés dans cet ordre, Constantinople; et nous Jugeons que 
cet arrangement n’est pas l’effet du hasard, non parce qu’il 
est moins possible que les autres, puisque si ce mot n’était 
employé dans aucune langue, nous ne lui soupconnerions 
point de cause particuliere; mais ce mot étant en usage 
parm nous, il est incomparabelement plus probable qu’une 
personne aura disposé ainsi les caractéres précédens, qu’il 
ne l’est que cet arrangement est di au hasard. [1814, p. 11] 


For comment on this example see Dale [1995, p. 9, Note 22]. 

For a general discussion of Donkin’s work see Zabell [1988c, §6.1]. 
Zabell (op. cit., p. 180) regards Donkin as representing “what may 
be the highwater mark in the defense of the Laplacean position” . 
Porter [1986, p. 122] goes so far as to refer to “the subjectivist W.F. 
Donkin”. 


See Newcomb [1860b, §23]. 

If, in Donkin’s theorem, we consider n probabilities p1,p9,..-. , Dn, 
the first r of which are unchanged by the new information while 
the remainder are altered to q-41,..-,@n (say), we find, on setting 


B= 041%, that pi +---+p, = 1-6. On defining pj = p;/f, fori € 
{1,2,...,r}, we obtain p; : p§ :: pj : pj, as asserted by Donkin. If the 
p;’s are initially the same (= 1/n), the entropy Hn = — >*7 pi log p; 
is maximal, and this maximality is preserved, under the changes in 
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probabilities mentioned above, by replacing each of the unchanged p; 
by px = (1— B)/r. 

As noted by Keynes [1921, chap. XVI, §13]. 

Notation altered: here all probabilities are supposed to be conditional 
on some fundamental state of knowledge. 

The justification of the assumed conditional independence of C and P 
given & in the third of the following formulae is unclear to me. Donkin 
merely states that the product of the appropriate expressions be taken 
to avoid the committing of “the fallacy of treating a provisional value 
as if it were definitive” [p. 364]. 

For a full discussion of Boole’s work in logic and probability see 
Hailperin [1976] and [1986]. 

Recently, in an unpublished paper, Lagarias discussed the relation- 
ship between Boole’s general method and maximum entropy. Given 
certain information, Boole considers two broad questions, viz. 

(1) is there a set of probabilities consistent with that information? 
and 

(2) if there are several probabilities consistent with the information, 
which should be chosen as being “most reasonable”? 

While Lagarias sees the first question essentially as a problem in lin- 
ear programming, the second is viewed as requiring the imposition of 
an additional criterion for its solution, that criterion being given by 
Boole as 


When the data have been translated into probabilities of 
events connected by conditions logical in form and explic- 
itly known, the problem may be constructed from a scheme 
of corresponding ideal events which are free, and of which 
the probabilities are such that when they (the ideal events) 
are restricted by the same conditions as the events in the 
data, their calculated probabilities will become the same as 
the given probabilities of the events in the data. 

(1862, p. 227] 


With this criterion Lagarias finds that Boole’s work provides not only 
a probability calculus, but also a method of inverse probability. 
Hailperin [1996, p. 107] finds part of the noteworthiness of this paper 
to inhere in Boole’s indicating that Pr[B — Al] is not identical to 
Pr[ A — B];: see also our §5.4. 
See Boole [1952, p. 261] for further details. 
For a broad discussion of this paper see Hailperin [1996, §2.4]. 
Details of the discussants are given in Boole [1952, p. 271] and Keynes 
(1921, chap. XVII, §2]. A wide-ranging and excellent discussion of the 
“challenge problem” was given by Hailperin [1986, §86.2—-6.3]. 
Whether Boole himself would have approved of the appellation is 
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perhaps doubtful, for in the paper in which this problem was posed 
he wrote 


While hoping that some may be found who, without de- 
parting from the line of their previous studies, may deem 
this question worthy of their attention, I wholly disclaim 
the notion of its being offered as a trial of personal skill 
or knowledge, but desire that it may be viewed solely with 
reference to those public and scientific ends for the sake of 
which alone it is proposed. [1851c, p. 286] 


Details of events held to celebrate the centenary of Cayley’s death in 
1895, together with some biographical information, are given in Gray 
[1995]. 

This problem is discussed in Hailperin [1996, §5.5]. 

In commenting on this paper Keynes writes “Boole’s mistake was 
pointed out, accurately though somewhat obscurely, by H. Wilbra- 
ham ” [1921, p. 167]. 

See also Boole [1854d]. 

Further comparison of Boole’s and Cayley’s solutions may be found 
in Boole [1854e]. 

Notation altered. 

MacColl was perhaps the first to introduce a specific notation for 
conditional probability: in his fourth paper, of 1880, he writes 


The symbol x, denotes the chance that the statement z is 
true on the assumption that the statement a is true, 
[p. 113] 


a notation that is changed to = in the sixth paper of 1897. We shall 
adopt the more standard notation here. 

One should also note that a similar notation was introduced by 
Peirce more than a decade before MacColl’s fourth paper, though 


perhaps with a slightly different meaning. Peirce [1867b] wrote 


Let 6, denote the frequency of b’s among the a’s, Then 
considered as a class, if a and 6 are events, 6, denotes the 
fact that if a happens b happens. 


While noting MacColl’s remarks on Boole’s work, Keynes says that 


MacColl ... saw that Boole’s fallacy turned on his defini- 
tion of Independence; but I do not think he understood, at 
least he does not explain, where precisely Boole’s mistake 
lay. [1921, p. 167] 
For comment on Boole’s confounding of “conditional probability” 
and “probability of a conditional” see Jaynes [1976, pp. 241-242]. 


On Boole’s blurring of the distinction between logical and stochastic 
independence see Hailperin (1996, pp. 219, 222-224]. 
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See also Hailperin [1986, p. 368]. 

Note also that Hailperin concludes that the best possible bounds on 
Pr[£] coincide “with the minimum of Boole’s upper limits and the 
maximum of his lower limits” [1986, p. 370]. 

In Chapter XVI, §7 Boole gives a summary of principles taken chiefly 
from Laplace: this summary, as Molina [1930, p. 384] has noted, does 
not include the Laplacean generalization of Bayes’s Theorem. 

This chapter Stigler [1984] in his review of Smith [1982], views as “an 
early contribution to the theory of upper and lower probabilities and 
the combination of evidence.” Upper and lower probabilities, in turn, 
may be viewed as special cases of upper and lower previsions, for 
further details on which Walley [1991] may profitably be consulted. 
Hacking [1984] traces the first explicit use of upper and lower proba- 
bilities to Ostrogradskii [1838]. 

It might be suggested that upper and lower probabilities provide 
some sort of measure of the vagueness that might be felt to exist in 
the assessment of probabilities. Not all writers, however (even those 
with Bayesian leanings), see a need for such probabilities, and Lindley 
[1971, pp. 114-116] has persuasively argued that a single probability 
is sufficient for the making of decisions. 

See Keynes [1921, p. 191]. 

The superficial similarity between this result and the rule of succes- 
sion has been noted by Hailperin [1986, pp. 372-373]. The distinguish- 
ing feature here is two-fold: (a) the probability of the occurrence of 
the event concerned is known, and (b) the event has a permanent 
cause. 

See Boole [1854a, p. 362]. 

See Keynes [1921, pp. 192-194]. 

See Keynes [1921, chap. XXX, §14]. 

This problem also received attention in the nineteenth century from 
Hagen [1837]. 

Jaynes [1976, p. 241] notes that Boole “did not reject it [Laplace’s 
work] in the ground of the actual performance of Laplace’s results 
in the case of the uniform prior because he, like Laplace’s other crit- 
ics, never bothered to examine the actual performance under these 
conditions”. For further comment on Boole and the principle of in- 
sufficient reason see Zabell [1988c, §6.2]. Zabell [1989a, p. 249] has 
noted that Boole’s objection is to the principle of insufficient reason 
rather than to the principle of cogent reason. Boole himself amplified 
his thoughts on this point in a paper published in 1862 [pp. 227-228] 
(see also Boole [1952, p. 390]). 

See Edgeworth [1884a, p. 208] and Keynes [1921, chap. IV, §9]. 
Jaynes [1976, p. 242] in fact claims that “all of ‘Boolean algebra’ was 
contained already in the rules of probability theory given by Laplace”. 
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Weierstra8’s and Bernstein’s Theorems were published in 1885 and 
1912 respectively: statements of these results are given in Feller [1966]. 
This section is commented on by Keynes [1921, chap. XVI, §16]: see 
also Hailperin [1986, §6.4]. 

Terrot, elected Bishop of Edinburgh and Pantonian professor in 1841, 
was chosen primus of Scotland in 1857, an office that he held until a 
stroke of paralysis forced his resignation in 1862. He was a Fellow and 
Vice-president of the Royal Society of Edinburgh, and contributed 
several papers to its journals. | 
On the assimilation of the problem as stated to the “bag and balls” 
case compare Boole’s An Investigation of the Laws of Thought, chap. 
XX, §23. 

The extra binomial coefficient required here if order is not considered 
will cancel out in the final analysis. 

The solution given here is in fact that given by Keynes [1921, chap. 
XXX, §11]: for further details of the evaluation of the ratio of the two 
finite sums see our §8.22. 

Meyer’s year of birth is sometimes given as 1803: the day was the 
31st. May, so the difference cannot be due to old versus new style. 
Correct, that is, except for a few misprints. 

The thirty-first chapter of Keynes [1921] is devoted to this result. 
For comment on the contributions of Bernoulli and de Moivre see 
Pearson [1925]. 

See MacKenzie [1981, pp. 236-237]. 

Venn in fact collaborated with Galton in the study of heredity — see 
Porter [1986, p. 271]. 

Porter [1986, p. 87] considers this work to be “The most influen- 
tial nineteenth-century work on the philosophy of probability” , while 
Mill, in the 1872 edition of his A System of Logic, Rattocinative and 
Inductive describes it as 


one of the most thoughtful and philosophical treatises on 
any subject connected with Logic and Evidence, which have 
been produced, to my knowledge, for many years. 

[Book III, Chap. XVIII, §6.] 


References throughout this section are to this edition. Venn made 
considerable changes in the second and third editions, but we have 
restricted our attention here to the last, which no doubt presented 
his considered views on probability and allied matters. 

Venn’s contention that the question of tzme is extraneous to proba- 
bility considerations is in conflict with the views expressed by Shafer 
[1982]. 

The term is introduced, on p. 190, with “A word of apology”. For a 
discussion of this concept see Salmon [1980b, pp. 131-132]. 
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Salmon [1980b, p. 133] finds “basic misunderstandings” in this chap- 
ter. 

The reference given in this quotation is to Fisher’s Statistical Meth- 
ods and Scientific Inference, chapter II, §3. Fisher states that Venn’s 
examples “seem to be little more than rhetorical sallies intended to 
overwhelm an opponent with ridicule. They scarcely attempt to con- 
form with the conditions of Bayes’ theorem, or of the rule of succession 
based upon it” [pp. 25-26]. 

For names of others who rejected the rule of succession — and of those 
who accepted it — see Keynes [1921, chap. XXX, §14]. Edgeworth 
[1884b], while agreeing in the main with Venn’s views, was led to 
conclude that “the particular species of inverse probability called the 
‘Rule of Succession’ may not be so inane as Mr. Venn would have us 
believe” [p. 234]. 

The title, apart from the first three words, also changed with the 
various editions; all are listed in the Bibliography. 

The exercises changed from one edition to another; so although we 
do not give a comprehensive treatment of Whitworth’s inventiveness 
here, enough is said to give a good idea of the type of question he 
considered. 

References given here as [p. n] refer to page n of the fifth edition of 
Choice and Chance of 1901. 


Chapter 9 


. Laurent usually used only the third of his christian names. 
. Given in Article 16 of Book II of the Théorie analytique des proba- 


bilttés. 


. For later discussion of the inversion of Bernoulli’s theorem see Cas- 


toldi [1959], Dale [1988b] and Jordan [1923], [1925], [1926a], [1926b] 
and [1933]. 


. For biographies of Jevons see FitzPatrick [1960, pp. 53-58] and Keynes 


[1936]. 


. Writing of Jevons’s work in general Zabell says 


In truth, there is little new in Jevons, but despite his many 
weaknesses, he represents a clear and succinct statement of 
the Laplacean position. [1989b, p. 299] 


. In his interpretation of probability Jevons was diametrically opposed 


to Venn: his views followed from those of de Morgan, Laplace and 
Poisson. See Porter [1986, pp. 175-176] and Strong [1976, §6]. 


. Further comment may be found in Keynes [1921, chap. IV, §4]. 
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Terrot considered the following situation (see Hailperin [1996, p. 124] 
for further discussion): suppose that A has seen p white and p—g black 
balls placed in an urn, and that B has seen r white and s — r black 
balls similarly introduced into the urn. A and B will then respectively 
estimate the probability of the drawing of a white ball from the urn 
as p/q and r/s. If the facts from which the inferences are drawn are 
communicated, then the probability of the drawing of a white ball 
will become (p+ r)/(q +s). After some discussion Terrot wrote 


I cannot conclude without suggesting a doubt, whether . 


be at any time the proper expression for the probability of 
an event which is “neither likely nor unlikely in regard of 
evidence”. 

It seems more analogous with the practice in other cases 
to express such probability by the indefinite fraction a If 
this expression be applied to either of the probabilities con- 
stituting the compound probability (p+r)/(q+s), the com- 
pound probability will be reduced to the remaining simple 
probability, for (0+7r)/(0+s)=7r/s. And this agrees with 
the necessary action of the mind, which takes no note of its 
original ignorance, after it has arrived at a definite proba- 
bility from partial knowledge. [1857, p. 375] 


A similar sentiment was expressed by Boole in a later issue of the 
same journal: 


It is a plain consequence of the logical theory of probabili- 
ties, that the state of expectation which accompanies entire 
ignorance of an event is properly represented, not by the 
fraction s but by the indefinite form a And this agrees 
with a conclusion at which Bishop Terrot, on independent, 
but as I think just grounds, has arrived. 


[Boole, 1952, p. 346] 


It must of course be born in mind that 2 is a symbol or form, and 
not a fraction. 


. Jevons notes that “The probability that an event has a particular 


condition entirely depends upon the probability that if the condition 
existed the event would follow” [1877, p. 240]. 


. This extension, as we have already seen, seems to be due to Lubbock 


and Drinkwater-Bethune [c.1830, art. 52]. 

For a more recent discussion of the value of further information see 
Horwich [1982, pp. 122-129]. 

In the introduction to the first volume of Peirce’s Collected Papers 
we read 
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In the development of exact or mathematical logic his pa- 
pers represent the most important and considerable con- 
tributions in the period between Boole’s Laws of Thought 
and Schroder’s Vorlesungen. [1965, p. iii] 


Catholic: general, or orthodox. 

In this section, references of the form {n.m} refer to Volume n, Para- 
graph m, of Peirce’s Collected Papers. 

Under a frequency interpretation probability is seen as an attribute of 
sequences of observations, while the propensity interpretation would 
ascribe probability to experimental conditions. The term “propensity 
interpretation” owes its introduction to Karl Popper (see, for exam- 
ple, his [1968, p. 147]), and support for it can be found in what may 
at first seem an unlikely source, viz. Kolmogoroff [1933, §1.2]. Further 
useful remarks may be found in Vovk [1993]. 

Collectivism, it is perhaps hardly necessary to state, is used here 
in the sense made popular by Richard von Mises, and not with the 
meaning attached to it by socialists. | 
For comment on Peirce’s criticism of the rule of succession and Boole’s 
remarks on “equally probable constitutions” versus “equally probable 
ratios” see Hailperin (1986, p. 408}. 


Peirce writes “we may almost say that ancient histone is simply the 
narrative of all the unlikely events that happened during the centuries 
it covers” {7.176}. 

Peirce begins his review with the words “Here is a book which should 
be read by every thinking man.” [1867a, p. 317]. 

There is some discussion of Bing’s work in Arne Fisher [1926]. 

The translations in this section are by the Foreign Language Service 
of the South African Council for Scientific and Industrial Research: 
the original texts may be found in Appendix 9.2. 

This application is taken from de Morgan [1838a], chap. III. Bing 
provides a translation of several pages from this essay. 

Bing finds in his paradox an analogy with one discussed by de Morgan 
in his Essay on Probabilities, but declares that the latter’s explanation 
is “most unsatisfactory”. An example similar to Bing’s is given by 
Arne Fisher [1926, §49]. 

Commenting on Laplace’s rule of succession as given by the formula 
(r+1)/(N 4+ 2), a rule giving the probability that the next trial will 
be a success, Good remarks that “Its square is not supposed to be the 
probability that the next two trials will be successful!” [1965, p. 77] 
(cf. also his p. 19). 

Translation: The only possible form of the function that does not 
give rise to contradictions has thus been proved to be unusable, and 
I would accordingly claim to have demonstrated that there simply 
is no such thing as a posteriori probability in problems in which no 
information is available about causes. 
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Arne Fisher [1926] finds that criticism of Bing’s work can in fact 
be reduced to the question of whether the principle of insufficient 
reason (Boole’s “equal distribution of ignorance”) or the principle of 
cogent reason is used. He claims that while an exact answer can be 
obtained in the former case, precise computation is usually impossible 
in the latter, only an approximate answer being obtainable when one’s 
information is partial and subjective. 

An even stronger statement has been made is de Finetti: see the 
preface to his [1974]. 

Jeffreys, in commenting on the use of the uniform prior, wrote 


[many people] appear to have thought that it was an essen- 
tial part of the foundations laid by Laplace that it should 
be adopted in all cases whatever, regardless of the nature of 
the problem. The result has been to a very large extent that 
instead of trying to see whether there was any more satis- 
factory form of the prior probability, a succession of authors 
have said that the prior probability is nonsense and there- 
fore that the principle of inverse probability, which cannot 
work without it, is nonsense too. [1961, p. 120] 


For further comment see Kroman [1908] and Whittaker [1920, §5]. 
Arne Fisher [1926, p. 56] in fact declares that Bing’s views seem to 
have prevailed over those of his critics. 

The surname is incorrectly given as “McAlister” in this first con- 
tribution to the problem. Sir Donald MacAlister, Senior Wrangler at 
Cambridge in 1877, and later a medical man and principal of the Uni- 
versity of Glasgow, produced, in response to a request from Galton, 
the log-normal distribution (see MacKenzie [1981, p. 235]). 

The problem was later discussed in Chapter VII, §17, of the third 
edition of 1888 of Venn’s The Logic of Chance. Venn finds the as- 
sumption of equal a priorz probabilities in this case less arbitrary 
than usual. 

See Ineichen [1994] for a further discussion of this test, and for a more 
mathematical treatment (in particular the connexion with hypergeo- 
metric functions) see Seneta [1994]. 

According to FitzPatrick [1960] and Mirowski [1994] Edgeworth’s first 
names were originally in reverse order. 

For a study of Edgeworth’s work see Bowley [1928], M°Cann [1996], 
Mirowski [1994] and Stigler [1986a, chap. 9]. Note that Mirowski’s 
views are not completely endorsed by Stigler in his review of 1995. 
Useful though Mirowski’s collection is, the serious reader must be 
warned that there are some misprints there that will necessitate re- 
ferral to the original papers for complete clarification. 

There are numerous references to inverse probability in Edgeworth’s 
writings (see, for example, the Index to Mirowski [1994]): we have 
considered only those that seem particularly relevant. 
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For a discussion of Gosset’s work on the t-distribution see Welch 
[1958]. 

Bowley [1928] claims that this paper (described by Mirowski [1994, 
p. 34] as a “meditation”) and that of 1922 (considered later in this 
section) provide the best insight into Edgeworth’s original and final 
thoughts on his conception of probability. For a discussion of Edge- 
worth’s compromise between subjectivism and frequentism see Porter 
[1986, p. 259]. 

It might be noted that, while Mirowski sees evidence here of Edge- 
worth’s enmity towards Venn (Mirowski [1994, p. 46]), Stigler, in his 
1995 review of Mirowski’s book, finds only the expression of a now 
outmoded spirit of intellectual honesty. Omnia mutantur! 

The methods and results of this paper were further extended in Edge- 
worth [1886], a paper that is not otherwise relevant to our present 
work. 

Here 24 is either a misprint for 26, or else is the number of letters 
in the English alphabet after certain identifications have been made 
— presumably z with 7 and u with v: or could erudite Edgeworth be 
referring to the Greek alphabet? 

For comments on this formula see Sobel [1987, pp. 170-171]. 
Edgeworth in fact realizes his example by considering the pattern of 
fragments of an exploding shell. 

I trust that the reader will not attribute the lack of clarity between 
an estimator and an estimate evinced here to ignorance on the part 
of the author. 

For biographical details on Dodgson, together with an examination 
of his mathematical work, Eperson [1933], Seneta [1984] and [1993], 
and Weaver [1954] & [1956] may be consulted. 

Commentary on Dodgson’s work in logic, in the construction of games 
and puzzles, in the construction of voting systems and photography is 
detailed in Seneta [1993, §1]: his work on logic is discussed in Braith- 
waite [1932], while a number of mathematical pamphlets are exam- 
ined in Abeles [1994]. 

The original title, Pellow-Problems thought out during sleepless nights, 
was changed by Dodgson for the second edition to Pillow-Problems 
thought out during wakeful hours, the author stating in the preface 


This last change has been made in order to allay the anx- 
iety of kind friends, who have written to me to express 
their sympathy in my broken-down state of health, believ- 
ing that I am a sufferer from chronic “insomnia”, and that 
it is as a remedy for that exhausting malady that I have 
recommended mathematical calculation. 


Indeed, he goes on to say that it is not 
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as a remedy for wakefulness that I have suggested math- 
ematical calculation; but as a remedy for the harassing 
thoughts that are apt to invade a wholly-unoccupied mind. 
I hope the new title will express my meaning more lucidly. 


Exactly what Dodgson meant by “transcendental” here is unclear, 
though I doubt that he is following Humpty Dumpty in taking the 
word to mean what he chooses it to mean. Perhaps it is merely used 
in the sense of being beyond the limits of ordinary experience, or a 
prior. 

References here are to the second edition of 1893 of Pillow-Problems. 
The bold figures at the end of the quotations give the dates on which 
Dodgson entertained the problems — or was entertained by them! 
For further details of the problems in probability Seneta [1984] may 
be profitably consulted. Seneta is perhaps charitable in saying 


As a probabilist he [i.e. Dodgson] is not important; but 
his work reflects the nature, standing and understanding 
of probability within fhe wider English mathematical com- 
munity of the time. [1993, p. 181] _ 


Dodgson never refers to Bayes’s Theorem by name. 
Weaver comments on Dodgson’s solution of this problem as follows: 


he makes two dreadful mistakes. First he assumes, incor- 
rectly, that the statement implies the probabilities of BB, 
BW and WW (the three possible constitutions of the bag) 
are 1/4, 1/2, and 1/4 respectively. Then he adds a black 
ball to the bag, calculates that the probability of now draw- 
ing a black ball is 2/3 and makes his second fatal error in 
concluding that the bag now must contain BBW. This line 
of reasoning thus leads him to the conclusion that the two 
original balls were one B and one W! This is good Won- 
derland, but very amateurish mathematics. [1956, p. 119] 


I must admit to agreeing with Seneta [1993] in not finding these 
“errors” nearly as dreadful as Weaver considers them to be. 

The article is signed merely with the initials M.W.C. 

As to the meaning of “cause” Crofton states “The term ‘cause’ is 
not here used in its metaphysical sense, but as simply equivalent to 


. ‘antecedent state of things’ ” [p. 773]. 


This problem, and the next, were also considered by Crofton in the 
chapter he contributed on mean value and probability to Williamson 
[1896]. 

It is almost unnecessary to remark that the word is used in its present 
British, rather than North American, sense. 
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The case discussed here is different to that which usually obtains 
in a medical setting where our concern is rather with the finding of 
Pr[D|A] given the prior Pr[D] and Pr[A|D] (see for example Elandt- 
Johnson [1971]). 

This book is still viewed with approbation: thus von Plato notes that 
“It has been considered the best philosophical book on probability 
of its time” [1994, p. 169], while Hacking describes it as “The most 
philosophically interesting German work on probability during the 
nineteenth century” [1990, p. 237]. Kamlah views it as “one of the 
first attempts in Germany to overcome the shortcomings of Laplace’s 
classical account” [1983, p. 240], and he says elsewhere that the Prin- 
cipten “was the most intelligent and sophisticated book on probability 
in Germany before World War I” [1987, p. 110]. 

See Porter [1986, p. 86]. Hacking [1990] notes that while the theory 
of von Kries’s book is subjective, the idea of objective probability 
is taken seriously, while von Plato similarly regards it as one of the 
first to introduce the idea of objective probability as motivated by 
statistical physics. Kamlah [1987] states that von Kries’s account of 
probabilities may be understood as a logical interpretation for a cer- 
tain type of probability. Kamlah had noted before that 


For v. Kries, probability is a measure of the expectation of 
an event, but not of the expectation relative to a certain 
given knowledge. It is rather a measure of the justifiable 
expectation of the event under certain conditions. 


[1983, p. 243] 


The formula is attributed by Hardy to Laplace, but its antecedents 
are certainly to be found in Price’s appendix to Bayes’s Essay. 
Good [1965, p. 17] has suggested that “It seems possible that G.F. 
Hardy was the first to suggest a ‘continuum of inductive methods,’ to 
use Carnap’s phrase.” For further comment on the choice of a beta 
prior see Good, op. cit., §§3.2, 4.1. 

Compare §9.6 on Bing’s paradox. 

From China. 

Le Cam, writing on the Central Limit Theorem, says “Bertrand and 
Poincaré wrote treatises on the calculus of probability, a subject 
neither of the two appeared to know. Except for some faint praise 
for Gauss’ circular argument, Bertrand’s book consists mainly of re- 
peated claims that his predecessors made grievous logical mistakes” 
[1986, p. 81]. A full study of Bertrand’s work in probability and the 
theory of errors is Sheynin [1994, §18 in particular]. 

The pagination, though, is different. Sheynin [1994, §1.2] provides 
some evidence in support of the claim that the Calcul des Probabilités 
was first published in 1888, and that at least some of the copies were 
wrongly dated. 
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A similar problem is discussed by Stabler [1892], who notes that 
Bertrand’s result, when written in the form (m+N x1/2)/(m+n+N), 
is a special case of a result given by Makeham (see §9.15). 

As Keynes (1921, chap. XXX, §11] has pointed out, the further as- 
sumption is needed that the number of balls is infinite: he also gives 
the correct solution that obtains when the urn contains only finitely 
many balls. 

For further comment see Sheynin [1994, p. 139 & §8]. 

Whitworth [1897, p. xii] states that, in his opinion, Venn and Chrys- 
tal have both missed the point that chance is a function of one’s 
knowledge, even though that knowledge be both limited and imper- 
fect: indeed, Whitworth describes (op. cit., p. xxvi) as “obstinate” 
Chrystal’s hypothesis that the chance of an event is a property of 
that event and is independent of the observer. 

See his An Investigation of the Laws of Thought [1854, chap. XVI, §3]. 
The definition is repeated in Chrystal [1904, vol. II, p. 567]. 

This is also repeated in Chrystal [1904, vol. II, p. 569]. 

Whitworth [1897, p. xxix], although dissenting with some of Chrys- 
tal’s views, joins him here in his denunciation of the rule of succession. 
This view was supported by Govan [1920, p. 228] in his comments on 
Makeham (see §9.15). 

For comment on Chrystal’s three-ball problem see Zabell [1989a, 
p. 252]. 

Described by Perks [1947, p. 286] as an “unfortunate onslaught”. 
Makeham explicates Laplace’s term by “p is a quantity such that the 
true value of p’’ in any particular urn is just as likely to be above as 
below it” [p. 246]. 

In 1947 Perks proposed p, dz = dz/[r2x1/?(1 — x)1/?] as a new in- 
difference rule, a rule that yields (m+ 1/2)/(m+n+ 1) as the pos- 
terior probability of the next trial’s being a success. Perks (op. cit., 
p. 304) notes the conformity between this new result and the expres- 
sion (m+ k)/(m+n-+ 2k) as obtained by W.E. Johnson (according 
to H. Jeffreys), and also notes that it fits Makeham’s empirical “gen- 
eral” formula. He finds, however, that “Makeham’s work is marred 
by serious confusions of thought” [op. cit., p. 304]. 

In Gini’s paper of 1949 a yet more general formula is suggested: viz. 
if p is the observed frequency in n events, the probable value of its 
probability is to be taken as (np + k)/(n+k+Ah), where k and h are 
determined by previous experience. 

Makeham’s own words are “If an observed event may be the result 
of one of n different causes; their probabilities are, respectively, as 
the probabilities of the event derived from their existence” [p. 450]: 
it seems clear, however, that the formulation as given in the text is 
what is intended. A similar comment occurs in connexion with the 
fourth principle. 
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It would perhaps be more accurate to say that Laplace derived his 
rule of succession for sampling from an infinite urn. Zabell [1989b] 
considers the importance in discussions of this rule of (a) whether 
sampling occurs with or without replacement and (b) whether the 
sampling takes place from a finite or an infinite urn. 

Strictly speaking, Poincaré’s work falls outside the scope of this book 
(the first edition of the Calcul des probabilités was published in 1896, 
while Pearson’s Grammar of Science appeared in 1892). It is never- 
theless, I believe, worth including a short discussion of it here. 
Poincaré’s book bears a marked structural resemblance to Bertrand 
[1889]: even five chapter titles are almost the same. Opinions on the 
two works differ, however. Thus Darboux said of Poincaré’s text that 
it “figurera dignement a coté des chefs-d’oeuvre de Laplace et de 
Bertrand” [1916, p. xxxiv], and further “Bertrand s’était borné 4 cri- 
tiquer et a démolir. Poincaré a commencé a reconstruire” [loc. cit.]. 
As we have noted before, Lucien Le Cam, however, was less flatter- 
ing: he wrote “Bertrand and Poincaré wrote treatises on the calculus 
of probability, a subject neither of the two appeared to know” [1986, 
p. 81]. Keynes, on the other hand, struck’ a middle-of-the-road ap- 
proach in his review of the second edition of Poincaré’s book (an 
edition that differed from the first in the re-organisation of the mate- 
rial into cohesive chapters rather than the “lecture reprint” form of 
the first edition, and in the presence of an introductory chapter on 
Le Hasard and a final chapter on Questions diverses) when he wrote 


The mathematics remain brilliant and the philosophy su- 
perficial — a combination, especially in the parts dealing 
with geometrical probability, which makes it often sugges- 
tive and often provoking. [1912, p. 114] 


For a discussion of the use of the impulse function in Laplace [1778] 
see Sheynin [1975]. 

Sheynin [1991, p. 152] has noted Poincaré’s consistent use of Bayesian 
methods in his treatment of observations. 

Cajori [1928-1929, §686] states that prior to about 1884 the name 
was spelled “McColl”. Like many of his countrymen of an earlier era 
MacColl evinced a fondness for the French, and he spent several years 
as a mathematics teacher in Boulogne (see Edwards [1967, p. 545]). 

We have already discussed MacColl’s contribution to the solution of 
Boole’s “challenge problem” in §8.17: for further details see Hailperin 
[1996, pp. 133-134]. 

Here, in MacColl’s notation, 


the symbol 4 denotes the chance that A is true on the 
assumption that B is true [1897, p. 556], 


while 
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The symbol 4 denotes the chance that A is true on the 
assumption that € is true; but, as € is understood to be 
true throughout, the symbol 4 simply denotes the chance 
that A is true; no assumption being made beyond the data 
of the problem, which are supposed to be always held in 
the recollection and understood when not stated. 


[1897, pp. 556-557] 


See, for example, Clero [1988] and Shafer [1982]. 

If the “arithmetic progression” assumption is dropped, P, will denote 
the event that the correct answer lies between x — dz and z+ dz. 
Once again we shall eschew MacColl’s notation. 

According to J.B.S. Haldane [1957] it was during Pearson’s stay in 
Germany during the early 1880’s that he began to spell his name with 
a K rather than a C. | 

Porter [1986, p. 274] describes Pearson as “an astute historian of 
science”, while Edgeworth writes of him as 


the author who has made the greatest advance in the sci- 
ence of Probabilities which has been made since the era of 


Poisson. [1896, p. 534] 


MacKenzie [1981, p. 73] mentions that Pearson’s writings included 
“poetry, a ‘passion play’, art history, studies of the Reformation and 
mediaeval Germany, philosophy, biography and essays on politics, 
quite apart from his contributions — in the form of over four hundred 
articles — to mathematical physics, statistics and biology.” Pearson 
was also a supporter of the feminist movement. For a partial listing 
of Pearson’s work see E.S. Pearson [1938]: Morant and Welch [1939] 
is also useful. : 

The second edition of this work was published in 1900, a third edition 
following in 1911. In his biography of his father E.S. Pearson [1938] 
wrote “It was because Pearson felt in later years that the task of 
bringing his Grammar up to date was beyond his powers, that he 
would not consent to its republication although all editions were out 
of print” [p. 132]. 

These examples are also considered in Jeffreys [1961, p. 131]. 

In Chapter I, §5, Pearson avers “The man who classifies facts of any 
kind whatever, who sees their mutual relation and describes their 
sequences, is applying the scientific method and is a man of science.” 
Compare the postulates in the Appendix on Eduction in Johnson 
{1924]. 

See MacKenzie [1981, p. 92]. 

See also Note 12 to Chapter 8 in MacKenzie [1981]. 

MacKenzie’s approach provides a posterior distribution for the vari- 
ance @ in sampling from N(0,0); Welch’s yields a posterior distribu- 
tion for the correlation coefficient p. 
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Consideration is also given, in Pearson and Filon’s paper, to multi- 
variate Normal situations. 

Porter [1986, p. 306] suggests that Pearson’s philosophy of probability 
was borrowed from Edgeworth. 

See also the discussion in §4.4. 

This expression includes the term B(r+1,s+1) omitted by Laplace. 
On Pearson’s evaluation of the beta-integrals, and the earlier work 
done by Laplace in this connexion, see Molina [1930, p. 376]. 

See Dale [1988b, p. 351]. 

Of this term Moroney [1951, p. 114], citing an unnamed source, says 
‘it is neither an error nor probable.” 

For further discussion of this matter see Jeffreys [1961, §7.2]. 
Further details may be found in Inman [1994] 


Appendix 


Reading Note 4 March 1960 
L.J. Savage 

Thomas Bayes, “Essay towards solving a problem in the doctrine of chances,” 

The Philosophical Transactions', 53 (1763), 370-418. 


This famous paper and another from the same volume (269-271) of 
the transactions were reproduced photographically by the Department of 
Agriculture in 1941 (?)? with some commentary by W.E. Deming and E.C. 
Molina. The other essay shows that Stirling’s series for n! is asymptotic 
only; this is apparently the first notice ever taken of asymptotic series. 
Both papers were edited posthumously by Bayes’s friend Richard Price, 
who made at least some contribution to them®. Biometrika 45 (1958), 293- 
315, republished the essay with a biographical note by G.A. Barnard‘. 

Though the essay is not long, it is rich, and I find need to prepare my- 
self a special sort of abstract of it. My interest for this purpose is not in 
mathematical aspects of the paper such as validity of demonstrations but 
in certain ideas. In what way, or ways, does Bayes view probability? What 
propositions does he consider important? In what form is “Bayes’s Theo- 
rem” among them? 

The essence of the essay is stated to be this “Problem”: “Given the num- 
ber of times in which an unknown event has happened and failed: Required 
the chance that the probability of its happening in a single trial lies some- 
where between any two degrees of probability that can be named.” Bayes 
says elsewhere that chance and probability are synonymous for him and 
seems to stick by that. The problem is of the kind we now associate with 
Bayes’s name, but it is confined from the outset to the special problem of 
drawing the Bayesian inference, not about an arbitrary sort of parameter, 
but about a “degree of probability” only. 

Price, in a letter introducing the essay, says that Bayes saw clearly how 
to solve his problem if an initial distribution were given and that Bayes 
thought there was good but not perfect reason to postulate the uniform 
prior distribution. It is thus with good reason that the term “Bayes’s Pos- 
tulate” is sometimes used for this assumption. What Price says convinces 
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me that Bayes was aware of Bayes’s theorem? in full generality, except pos- 
sibly that he confined himself to unknown “degrees of probability.” Price 
declares the problem to be central to the philosophy of induction and there- 
fore to the “argument taken from final causes for the existence of the Deity.” 
De Moivre’s theorem® he holds as nothing compared to this in importance; 
the converse problems must not be confused with each other. 

After baldly stating his Problem, Bayes presents, as Section I, a whole 
short course on probability. Using modern terms freely, it may be para- 
phrased thus: 


Definitions: 
1. Inconsistent events = incompatible events. 
2. Contrary events = a two-fold partition, a dichotomy. 


3. “An event is said to fail, when it cannot happen; or, which comes to 
the same thing, when its contrary has happened.” 


4. “An event is said to be determined when it has either happened or 
failed.” 


5. “The probability of any event is the ratio between the value at which 
an expectation depending on the happening of the event ought to be 
computed, and the value of the thing expected upon it’s [sic] happen- 
ing.” 


6. “By chance I mean the same as probability.” 


7. “Events are independent when the happening of any one of them does 
neither increase nor abate the probability of the rest.” 


These definitions could provoke many remarks. For example, apparently 
Bayes thinks that pairwise independence is independence. The definition 
of probability is of course most interesting. Apparently, an expectation was 
clearly understood by contemporaries as a payment contingent on an event, 
and such things must have sometimes been bought, sold, and used as col- 
lateral. What does “ought” mean? 

Next some propositions and Corollaries are ated: I have not taken 
great pains to check the derivations, but in general they take Bayes’s defi- 
nition of probability seriously; they are beclouded by the idea that numbers 
are a little more shameful than ratios. 


Prop. 1. Simple additivity. 


Cor. The sum of probabilities over a partition and, in particular, over a 
dichotomy is 1. 
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Prop. 2. “If a person has an expectation depending on the happening of an 
event, the probability of the event is to the probability of its failure as his 
loss if it fails to his gain if it happens.” 


That is, the odds p/q for winning a simple fair lottery is the ratio of 
the prize minus the price of the ticket to the price of the ticket. 


Prop. 3. Pr[A and B] = Pr [A] Pr[B|A]. 
Cor. Pr[B|A] = Pr[A and B] / Pr [A]. 


Prop. 4. “If there be two subsequent events to be determined every day, 
and each day the probability of the 2nd is b/N and the probability of both 
P/N, and I am to receive N if both the events happen the first day on 
which the 2nd does; I say, according to these conditions, the probability of 
my obtaining N is P/b.” 


That is, if A; and B; are independent from index to index, the prob- 
ability that the first occurrence of a B; will be accompanied by that of the 
corresponding A; is what it should be. 


Cor. “Suppose after the expectation given me in the foregoing proposi- 
tion, and before it is at all known whether the lst event has happened or 
not, I should find that the 2nd event has happened; from hence I can only 
infer that the event is determined on which my expectation depended, and 
have no reason to esteem the value of my expectation either greater or less 
than it was before. ... But the probability that an event has happened is 
the same as the probability I have to guess right if I guess it has happened. 
Wherefore the following proposition is evident.” 


Prop. 5. “If there be two subsequent events, the probability of the 2nd 
b/N and the probability of both together P/N, and it being first discov- 
ered that the 2nd event has happened, from hence I guess that the 1st event 
has also happened, the probability I am in the right is P/b.”” 


Prop. 6. Product rule for “several independent events.” 


Cor. 1. The probability of prescribed sequences of successes and failures 
of independent events. 


Cor. 2. Ditto when all these events have probability a. 
Definition. In effect, defines Bernoulli trials. Here Bayes deliberately in- 


troduces an ambiguity that helps and hinders us to this day: “And hence 
it is manifest that the happening or failing of the same event in so many 
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diffe[rent] trials, is in reality the happening or failing of so many distinct 
independent events exactly similar to each other.” 


Prop. 7. Derives the binomial distribution for Bernoulli trials. 


This concludes Section I, the short course on first principles. It is ad- 
mirable and shows good insight into conditional probability, but there is 
no trace of what we think of as characteristic of Bayes, a theorem about 
the probability of causes. The germ of that, but the germ only, is to come 
in the next and final section. 


Section II 


Straining over the rigoritis of his own time, but showing perfectly mod- 
ern insight into the thing itself, Bayes describes a schematic Monte Carlo 
procedure based on a levelled table and two balls. Throwing the first ball 
once selects an a uniformly between 0 and 1. Then, throwing the second 
ball n times yields n trials, that, given a, are independent with probability 
a. 


Prop. 8. Calculates by means of a beta integral the probability that a 
will fall in a preassigned interval and p of the n trials will be successful. 


Cor. Gives the probability of just p successes indirectly in terms of the 
ratio of a beta integral to a binomial coefficient. Bayes knows that this is 
the same, namely 1/(n + 1), for all p, but he is too formal to mention it 
here. A little later he adduces this uniformity in p as a particularly telling 
justification for Bayes’s postulate as a description of the blank mind. 


Prop. 9. Gives the probability of any interval in a given p (couched in 
the language of guessing). 


Cor. Gives the cumulative distribution of a given p. 


Scholium. A deliberately extra-mathematical argument in defense of Bayes’s 
postulate, already mentioned by me in connection with the corollary to 
Proposition 8. 


Prop. 10. Restates Prop. 9 in terms of the newly gained knowledge of the 
value of the complete beta integral. 


The essay concludes with practical rules for computing the incomplete 
beta integral, which is outside the province of this abstract. 

Price has an appendix of numerical examples and philosophy. Below is an 
instance that Laplace has later made famous. Such a discussion necessarily 
seems old-fashioned today, but the second paragraph is far from naive and 
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the third seems important against those who take universals seriously. 

“Let us imagine to ourselves the case of a person just brought forth into 
this world, and left to collect from his observation of the order and course 
of events what powers and causes take place in it. The Sun would, prob- 
ably, be the first object that would engage his attention; but after losing 
it the first night he would be entirely ignorant whether he should ever see 
it again. He would therefore be in the condition of a person making a first 
experiment about an event entirely unknown to him. But let him see a 
second appearance or one return of the Sun, and an expectation would be 
raised in him of a second return, and he might know that there was an odds 
of 3 to 1 for some probability of this. This odds would increase, as before 
represented, with the number of returns to which he was witness. But no 
finite number of returns would be sufficient to produce absolute or physi- 
cal certainty. For let it be supposed that he has seen it return at regular 
and stated intervals a million of times. The conclusions this would warrant 
would be such as follow. There would be the odds of the millionth power 
of 2, to one, that it was likely that it would return again at the end of the 
usual interval. There would be the probability expressed by 0.5352, that 
the odds for this was not greater than 1,600,000 to 1; and the probability 
expressed by 0.5105, that it was not less than 1,400,000 to 1. 

“It should be carefully remembered that these deductions suppose a pre- 
vious total ignorance of nature. After having observed for some time the 
course of events it would be found that the operations of nature are in gen- 
eral regular, and that the powers and laws which prevail in it are stable and 
permanent. The consideration of this will cause one or a few experiments 
often to produce a much stronger expectation of success in further exper- 
iments than would otherwise have been reasonable; just as the frequent 
observation that things of a sort are disposed together in any place would 
lead us to conclude, upon discovering there any object of a particular sort, 
that there are laid up with it many others of the same sort. It is obvious 
that this, so far from contradicting the foregoing deductions, is only one 
particular case to which they are to be applied. 

“What has been said seems sufficient to shew us what conclusions to 
draw from uniform experience. It demonstrates, particularly, that instead 
of proving that events will always happen agreeably to it, there will be 
always reason against this conclusion. In other words, where the course 
of nature has been the most constant, we can have only reason to reckon 
upon a recurrency of events proportioned to the degree of this constancy; 
but we can have no reason for thinking that there are no causes in nature 
which will ever interfere with the operations of the causes from which this 
constancy is derived, or no circumstances of the world in which it will fail. 
And if this true, supposing our only data derived from experience, we shall 
find additional reason for thinking thus if we apply other principles, or have 
recourse to such considerations as reason, independently of experience, can 
suggest.” 
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Notes 


1. 
Z. 


Often referenced as Phil. Trans. Roy. Soc. London. 


Actually 1940. Facsimiles of two papers by Bayes. (ed. W.E. Dem- 
ing.) The Graduate School, United States Department of Agriculture, 
Washington. 


. The letter on series bears the legend “Read Nov. 24, 1763” (i.e. after 


Bayes’s death), and while it begins 


If the following observations do not seem to you to be too 
minute, I should esteem it as a favour, if you would please 
to communicate them to the Royal Society [p. 269], 


I can find no further evidence that the paper was communicated, let 
alone edited, by Price. [A.I.D.] 


. A further publication is in Studies in the History of Statistics and 


Probability (ed. E.S. Pearson and M.G. Kendall.) London: Griffin 
(1970). 


. The suggestion has been made that Bayes is not the originator of 


the theorem that is now named after him. See S.M. Stigler (1983), 
American Statistician 37, 290-296. 


. It is not quite clear what is meant here; presumably the binomial dis- 


tribution and its normal approximation. See A.W.F. Edwards (1986), 
American Statistician 40, 109-110. 


. A careful analysis of this has been given by G. Shafer (1982), Amer- 


ican Statistician 10, 1075-1089. 


EPIPHONEMA 


Non est necesse hec omnia ad felicitatem obser- 
vare; sed tamen qui hzc omnia observaverit felix 
erit. 

Longe autem facilius est hzec scire quam exequl. 


Girolamo Cardano, 
“Preceptorum ad filtos libellus”. 
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